ShouWenWang-Lab / snakemake_DARLIN

4 stars 3 forks source link

Does the python pipeline include sequence alignment and allele calling functions? #8

Open peipp410 opened 3 weeks ago

peipp410 commented 3 weeks ago

Hi! Thanks for providing a python-based version of CARLIN pipeline. We want to modify the arguments to run on our own sequencing data and reference from a different experiment protocol. However, I can't locate where the function for sequence alignment is. It seems that the MosaicLineage directly inputs the matlab object from the original CARLIN pipeline? Thanks!

ShouWenWang commented 3 weeks ago

The snakemake_DARLIN has two ways to analyze the CARLIN data. 1, the matlab way. It relies on the matlaib CARLIN pipeline. To modify the reference sequence, you need to change it here: https://github.com/ShouWenWang-Lab/Custom_CARLIN/tree/main/%40CARLIN_def https://github.com/ShouWenWang-Lab/Custom_CARLIN/tree/main/cfg This change is big and challenging.

2, there is also a python based method. This method, however, does not call mutations in the sequence like the matlab way. It only identifies the DARLIN sequence, and use difference sequences to call different clones. We implemented a single-cell version. But changing it to bulk should be also easy. https://github.com/ShouWenWang-Lab/snakemake_DARLIN/blob/master/QC/single_cell_DARLIN-10x.ipynb

-- Shou-Wen Wang, Ph.D. Principal Investigator School of Life Sciences | School of Science Westlake University Shilongshan ST #18, Xihu, Hangzhou, Zhejiang https://www.shouwenwang-lab.com/

From: Jiazheng Pei @.> Date: Friday, June 14, 2024 at 10:45 AM To: ShouWenWang-Lab/snakemake_DARLIN @.> Cc: Subscribed @.***> Subject: [ShouWenWang-Lab/snakemake_DARLIN] Does the python pipeline include sequence alignment and allele calling functions? (Issue #8)

Hi! Thanks for providing a python-based version of CARLIN pipeline. We want to modify the arguments to run on our own sequencing data and reference from a different experiment protocol. However, I can't locate where the function for sequence alignment is. It seems that the MosaicLineage directly inputs the matlab object from the original CARLIN pipeline? Thanks!

— Reply to this email directly, view it on GitHubhttps://github.com/ShouWenWang-Lab/snakemake_DARLIN/issues/8, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABDCASTPASIB6GM7V5D6ZK3ZHJKLTAVCNFSM6AAAAABJJQET3OVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2TEMZXGMYDEMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

peipp410 commented 3 weeks ago

Thank you! I'll try that.

peipp410 commented 1 week ago

Hello, I encountered another problem. Is this process suitable for running large files? I have a double-end sequencing file, each about 2GB. When running the script using a 64-core server, the following error still occurs.

截屏2024-06-28 18 08 17
ShouWenWang commented 1 week ago

I have not encountered this problem before. But the preprocessing could be resource intensive. You could subset your data into several parts, process each separately, and merge the final results together.

―― Shou-Wen Wang, PhD Principal Investigator School of Life Sciences | School of Sciences Westlake University Shilongshan ST #18, Xihu, Hangzhou, Zhejiang https://www.shouwenwang-lab.com/


From: Jiazheng Pei @.> Sent: Friday, June 28, 2024 6:12:29 PM To: ShouWenWang-Lab/snakemake_DARLIN @.> Cc: Shouwen WANG 王寿文 @.>; Comment @.> Subject: Re: [ShouWenWang-Lab/snakemake_DARLIN] Does the python pipeline include sequence alignment and allele calling functions? (Issue #8)

Hello, I encountered another problem. Is this process suitable for running large files? I have a double-end sequencing file, each about 2GB. When running the script using a 64-core server, the following error still occurs. 2024-06-28.18.08.17.png (view on web)https://github.com/ShouWenWang-Lab/snakemake_DARLIN/assets/59289157/a3845fb0-901c-4106-ad71-50734ccb898d

― Reply to this email directly, view it on GitHubhttps://github.com/ShouWenWang-Lab/snakemake_DARLIN/issues/8#issuecomment-2196568954, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABDCASUNW3ADQSDLY5QJWO3ZJUZI3AVCNFSM6AAAAABJJQET3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJWGU3DQOJVGQ. You are receiving this because you commented.Message ID: @.***>

peipp410 commented 1 week ago

OK! Thank you!

jzussman commented 1 day ago

Would it be possible for you to provide a bulk version of the python method for allele calling? Alternatively, if you don't have the bandwidth at the moment, could you instruct me on how I could most easily adapt the existing python pipeline to bulk sequence data?

In addition, would adapting the existing python-based single-cell pipeline to the newer 10xV4 chemistry be as simple as adding in the 10xV4 barcode list to /reference? What else would be required to complete this configuration? I have successfully generated this config using the MATLAB CARLIN pipeline version, but am not sure how I would do so in the python version.