MtbEvolution / resR_Project

2 stars 0 forks source link

Inconsistency between the parameters in the Fixed_SNPs_Calling script and the supplementary material of the article #1

Open Dx-wmc opened 1 year ago

Dx-wmc commented 1 year ago

Dear Sir, I am very impressed by the author's work and the depth of the analysis in your article. Also, I would like to thank you for sharing the scripts. However, I just started using your Fixed_SNPs_Calling script, but I was surprised to find that the parameters were inconsistent with the article, for example, the article's supplementary material says Fixed SNPs with a frequency of ≥ 90% and at least 10 supporting reads were identified using VarScan (2.2.3.9). with the strand bias filter on , while in the scripts (0_SNP_Calling.sh), java -jar ~/software/VarScan.v2.3.9.jar mpileup2snp Sample.pileup --min-coverage 3 --min-reads2 2 --min-avg-qual 20 --min-var-freq 0.01 --min-freq-for-hom 0.9 --p-value 99e-02 --strand-filter 1 > Sample.varscan.And I noticed that the SNP_calling code in the Fixed_SNPs_Calling is the first half of the SNP_calling code in the Unfixed_SNPs_Calling, which seems to be a problem. Above is the small problem that I just met, I know the author's scientific research work is hectic, hope to be able to take some time to answer, wish to receive your reply as soon as possible! Yours, Dave

MtbEvolution commented 1 year ago

Hi Dave,

Thanks for pointing out this inconsistent place and the nice comments.

For the Varscan2 step, we lowered the threshold for coverage in the beginning to capture as many unfixed SNPs as possible. In a later step, the script “2_fix_extract.pl” is supposed to set the coverage threshold for fixed SNPs to “10”. However, I noticed that I uploaded an older version of that script. I am going to update that the first thing tomorrow morning.

Yes, the fixed SNPs calling share the first half scrip with unfixed SNPs calling (from fastq to varscan, and to cns, ppe, etc.). The difference in unfixed SNPs calling is that we need set a bunch of criteria to filter those false positives, especially for genomes of Mtb.

Thank you again for the kind reminding! Please do let me know if there is any other question.

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 4, 2023, at 22:09, Dave @.**@.>> wrote:

Dear Sir, I am very impressed by the author's work and the depth of the analysis in your article. Also, I would like to thank you for sharing the scripts. However, I just started using your Fixed_SNPs_Calling script, but I was surprised to find that the parameters were inconsistent with the article, for example, the article's supplementary material says Fixed SNPs with a frequency of ≥ 90% and at least 10 supporting reads were identified using VarScan (2.2.3.9). with the strand bias filter on , while in the scripts (0_SNP_Calling.sh), java -jar ~/software/VarScan.v2.3.9.jar mpileup2snp Sample.pileup --min-coverage 3 --min-reads2 2 --min-avg-qual 20 --min-var-freq 0.01 --min-freq-for-hom 0.9 --p-value 99e-02 --strand-filter 1 > Sample.varscan.And I noticed that the SNP_calling code in the Fixed_SNPs_Calling is the first half of the SNP_calling code in the Unfixed_SNPs_Calling, which seems to be a problem. Above is the small problem that I just met, I know the author's scientific research work is hectic, hope to be able to take some time to answer, wish to receive your reply as soon as possible! Yours, Dave

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=jtgUmP1LmVYkZOHdR9ot1sisN45p4g7SrNrwtGi3ixWSyWLtqnWzrS54kmImFGaH&s=HNB60HlNz7Nc295BVTcEnbKmbQb7717B3ejBTu1mFHI&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOQJDYC6NFJOFRG332TWQY3QFANCNFSM6AAAAAATRO4454&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=jtgUmP1LmVYkZOHdR9ot1sisN45p4g7SrNrwtGi3ixWSyWLtqnWzrS54kmImFGaH&s=S5o7hb1YwXyjoxMbmUJC40my-K29csjX0qPD2DH0uG8&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>

MtbEvolution commented 1 year ago

Hi Dave,

I have just updated the “fix_extract.pl” script.

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 4, 2023, at 23:32, Qingyun Liu @.**@.>> wrote:

Hi Dave,

Thanks for pointing out this inconsistent place and the nice comments.

For the Varscan2 step, we lowered the threshold for coverage in the beginning to capture as many unfixed SNPs as possible. In a later step, the script “2_fix_extract.pl” is supposed to set the coverage threshold for fixed SNPs to “10”. However, I noticed that I uploaded an older version of that script. I am going to update that the first thing tomorrow morning.

Yes, the fixed SNPs calling share the first half scrip with unfixed SNPs calling (from fastq to varscan, and to cns, ppe, etc.). The difference in unfixed SNPs calling is that we need set a bunch of criteria to filter those false positives, especially for genomes of Mtb.

Thank you again for the kind reminding! Please do let me know if there is any other question.

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 4, 2023, at 22:09, Dave @.**@.>> wrote:

Dear Sir, I am very impressed by the author's work and the depth of the analysis in your article. Also, I would like to thank you for sharing the scripts. However, I just started using your Fixed_SNPs_Calling script, but I was surprised to find that the parameters were inconsistent with the article, for example, the article's supplementary material says Fixed SNPs with a frequency of ≥ 90% and at least 10 supporting reads were identified using VarScan (2.2.3.9). with the strand bias filter on , while in the scripts (0_SNP_Calling.sh), java -jar ~/software/VarScan.v2.3.9.jar mpileup2snp Sample.pileup --min-coverage 3 --min-reads2 2 --min-avg-qual 20 --min-var-freq 0.01 --min-freq-for-hom 0.9 --p-value 99e-02 --strand-filter 1 > Sample.varscan.And I noticed that the SNP_calling code in the Fixed_SNPs_Calling is the first half of the SNP_calling code in the Unfixed_SNPs_Calling, which seems to be a problem. Above is the small problem that I just met, I know the author's scientific research work is hectic, hope to be able to take some time to answer, wish to receive your reply as soon as possible! Yours, Dave

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=jtgUmP1LmVYkZOHdR9ot1sisN45p4g7SrNrwtGi3ixWSyWLtqnWzrS54kmImFGaH&s=HNB60HlNz7Nc295BVTcEnbKmbQb7717B3ejBTu1mFHI&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOQJDYC6NFJOFRG332TWQY3QFANCNFSM6AAAAAATRO4454&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=jtgUmP1LmVYkZOHdR9ot1sisN45p4g7SrNrwtGi3ixWSyWLtqnWzrS54kmImFGaH&s=S5o7hb1YwXyjoxMbmUJC40my-K29csjX0qPD2DH0uG8&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Dx-wmc commented 1 year ago

Hi Dave, Thanks for pointing out this inconsistent place and the nice comments. For the Varscan2 step, we lowered the threshold for coverage in the beginning to capture as many unfixed SNPs as possible. In a later step, the script “2_fix_extract.pl” is supposed to set the coverage threshold for fixed SNPs to “10”. However, I noticed that I uploaded an older version of that script. I am going to update that the first thing tomorrow morning. Yes, the fixed SNPs calling share the first half scrip with unfixed SNPs calling (from fastq to varscan, and to cns, ppe, etc.). The difference in unfixed SNPs calling is that we need set a bunch of criteria to filter those false positives, especially for genomes of Mtb. Thank you again for the kind reminding! Please do let me know if there is any other question. Best, Qingyun

Dear Qingyun,

Thank you very much for your answer to my questions. In addition to the above problem, I also encountered some problems when I ran the scripts in Unfixed_SNPs_Calling. I used one of the data you used (DRR034341) for the test, however, the DRR034341.mixforkept file is empty, which is inconsistent with the results in your All_unfixed_mutation.txt file (Lines 2-9).

Also, I compared the SNP sites in the DRR034341.mixfor file with sites in the All_unfixed_mutation.txt, some sites existing in All_unfixed_mutation. txt do not exist in the DRR034341.mixfor file. Moreover, when I examine the depth file, I get a depth (20) that is significantly different from the 36 described in your 51229_Mtb_Seq_Info.txt file. I've run the script many times, but it's all the same result. I don't know what went wrong. I put the script I ran and the corresponding results in the attachment zip, If you have free time, please check it out.

I'm sorry to bother you again, and I sincerely hope you can reply.

Yours,

Dave DRR034341.tar.gz

MtbEvolution commented 1 year ago

Hi Dave,

I will be very happy to help you walk through this and and also check if there is any details that I haven’t updated.

Here attached the forup and keptfile file from my analysis, DRR034341 is not empty.

For coverage calculation, I wonder if we have different ways of calculating depth? I did notice different tools would result in slightly different depth calculations.

Would you send me your running script and I can help you check both?

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 5, 2023, at 07:27, Dave @.**@.>> wrote:

Hi Dave, Thanks for pointing out this inconsistent place and the nice comments. For the Varscan2 step, we lowered the threshold for coverage in the beginning to capture as many unfixed SNPs as possible. In a later step, the script “2_fix_extract.pl” is supposed to set the coverage threshold for fixed SNPs to “10”. However, I noticed that I uploaded an older version of that script. I am going to update that the first thing tomorrow morning. Yes, the fixed SNPs calling share the first half scrip with unfixed SNPs calling (from fastq to varscan, and to cns, ppe, etc.). The difference in unfixed SNPs calling is that we need set a bunch of criteria to filter those false positives, especially for genomes of Mtb. Thank you again for the kind reminding! Please do let me know if there is any other question. Best, Qingyun

Dear Qingyun,

Thank you very much for your answer to my questions. In addition to the above problem, I also encountered some problems when I ran the scripts in Unfixed_SNPs_Calling. I used one of the data you used (DRR034341) for the test, however, the DRR034341.mixforkept file is empty, which is inconsistent with the results in your All_unfixed_mutation.txt file (Lines 2-9).

Also, I compared the SNP sites in the DRR034341.mixfor file with sites in the All_unfixed_mutation.txt, some sites existing in All_unfixed_mutation. txt do not exist in the DRR034341.mixfor file. Moreover, when I examine the depth file, I get a depth (20) that is significantly different from the 36 described in your 51229_Mtb_Seq_Info.txt file. I've run the script many times, but it's all the same result. I don't know what went wrong. I put the script I ran and the corresponding results in the attachment zip, If you have free time, please check it out.

I'm sorry to bother you again, and I sincerely hope you can reply.

Yours,

Dave DRR034341.tar.gzhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10352104_DRR034341.tar.gz&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=4WOGX7AdG80dwAIDgnZy2lhrzR_RH4uq9iKTHRUXXpeOs_7A9L9_UWJZwi_dgkck&s=ws5NXaBqKZ5LzM6YCZGqdZ0t-HaiAYmYGbOUqThVluQ&e=

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1-23issuecomment-2D1372153624&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=4WOGX7AdG80dwAIDgnZy2lhrzR_RH4uq9iKTHRUXXpeOs_7A9L9_UWJZwi_dgkck&s=YQ0T2iJso9RE_b6d0_op6QfJdcXFoJcZ6vqqMbH0I_I&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOXJFW7FQQNOA3JVJ23WQ24ZPANCNFSM6AAAAAATRO4454&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=4WOGX7AdG80dwAIDgnZy2lhrzR_RH4uq9iKTHRUXXpeOs_7A9L9_UWJZwi_dgkck&s=BS397ynZOY2w0vDCWhFN0MKtdwJtBn8xIt-7dqMB3J4&e=. You are receiving this because you commented.Message ID: @.***>

Dx-wmc commented 1 year ago

Hi Dave, I will be very happy to help you walk through this and and also check if there is any details that I haven’t updated. Here attached the forup and keptfile file from my analysis, DRR034341 is not empty. For coverage calculation, I wonder if we have different ways of calculating depth? I did notice different tools would result in slightly different depth calculations. Would you send me your running script and I can help you check both? Best, Qingyun

Dear Qingyun,

I'm sorry that I didn't see your forup and keptfile file your analyzed in your last reply.

For the calculation of depth, I used the script you provided and the calculation method ofsamtools, respectively, the script I used and the forup and the depth file generated by the script are attached here, please check them!

unfix_SNP_script.txt DRR034341.forup.txt DRR034341.dep.txt

Thank you sincerely for your help.

Yours,

Dave

MtbEvolution commented 1 year ago

Hi Dave,

Got it! I am quite busy today but will test the script as soon as possible and get back to you!

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 5, 2023, at 08:28, Dave @.**@.>> wrote:

Hi Dave, I will be very happy to help you walk through this and and also check if there is any details that I haven’t updated. Here attached the forup and keptfile file from my analysis, DRR034341 is not empty. For coverage calculation, I wonder if we have different ways of calculating depth? I did notice different tools would result in slightly different depth calculations. Would you send me your running script and I can help you check both? Best, Qingyun

Dear Qingyun,

I'm sorry that I didn't see your forup and keptfile file your analyzed in your last reply.

For the calculation of depth, I used the script you provided and the calculation method ofsamtools, respectively, the script I used and the forup and the depth file generated by the script are attached here, please check them!

unfix_SNP_script.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10352494_unfix-5FSNP-5Fscript.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=3a-UnQ2h5devHZLKVzgTXlDpzsDVdzPfS0Xzvg_gX1EpxC6oxTM9rwXYi6ALaPsj&s=6AzDGybA6IHFOL_hmHJ5k3kcIbRvl5Pf7N37aZIVp3U&e= DRR034341.forup.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10352496_DRR034341.forup.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=3a-UnQ2h5devHZLKVzgTXlDpzsDVdzPfS0Xzvg_gX1EpxC6oxTM9rwXYi6ALaPsj&s=K0_yaRVDhqmCzRKa9OwbFbDBxXTsNkoN_bW7yGFzJDo&e= DRR034341.dep.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10352497_DRR034341.dep.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=3a-UnQ2h5devHZLKVzgTXlDpzsDVdzPfS0Xzvg_gX1EpxC6oxTM9rwXYi6ALaPsj&s=YxJEg-Tuh88QUxaZgDll6tlfSHorE8JgLyRWlVQj9CE&e=

Thank you sincerely for your help.

Yours,

Dave

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1-23issuecomment-2D1372216565&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=3a-UnQ2h5devHZLKVzgTXlDpzsDVdzPfS0Xzvg_gX1EpxC6oxTM9rwXYi6ALaPsj&s=OWnNvSFq3b77irV_XOezlWogZ7z3uYD8xqUhemBEq7M&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOQRKK5GBNNK4D65O3LWQ3EA7ANCNFSM6AAAAAATRO4454&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=3a-UnQ2h5devHZLKVzgTXlDpzsDVdzPfS0Xzvg_gX1EpxC6oxTM9rwXYi6ALaPsj&s=yb4uxIv7jOlPamqTFIaLe2ifEf5BcdmwCqEz4YyE0lA&e=. You are receiving this because you commented.Message ID: @.***>

Dx-wmc commented 1 year ago

Hi Dave, Got it! I am quite busy today but will test the script as soon as possible and get back to you! Best, Qingyun

Dear Qingyun,

I would like to express my gratitude again and hope to receive your reply in your free time.

Your,

Dave

MtbEvolution commented 1 year ago

Hi Dave,

It’s my pleasure! Would you "ls -lth” all the files you generated and send that output to me (including fastq)?

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 5, 2023, at 08:39, Dave @.**@.>> wrote:

Hi Dave, Got it! I am quite busy today but will test the script as soon as possible and get back to you! Best, Qingyun

Dear Qingyun,

I would like to express my gratitude again and hope to receive your reply in your free time.

Your,

Dave

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1-23issuecomment-2D1372227450&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=wAJUoQr3NI4w7ux5QHmRw6HsZzKP99ebvk17C-wIzMy9tgaWCv09Gao52LyvEiec&s=PhGqbSxpIyTT5CoY6DRjsOfV2PZW95ne6Y3mo8Lp6vI&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOV22TOWV4AXOQOSSNLWQ3FKRANCNFSM6AAAAAATRO4454&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=wAJUoQr3NI4w7ux5QHmRw6HsZzKP99ebvk17C-wIzMy9tgaWCv09Gao52LyvEiec&s=By20fDQSYby3V6GlaSTKAoQGZX9VzOcTg4xe1gkEXMw&e=. You are receiving this because you commented.Message ID: @.***>

Dx-wmc commented 1 year ago

Hi Dave, It’s my pleasure! Would you "ls -lth” all the files you generated and send that output to me (including fastq)? Best, Qingyun

Dear Qingyun,

Thank you for helping me solve the problem in your busy schedule.

I have listed all the files in my folder and put them in the following file. Please check it. file_list.txt

Your,

Dave

MtbEvolution commented 1 year ago

Hi Dave,

I just reran this isolate’s data, and got results below. Seems like your fileup file size is much smaller than mine, might explain why you got lower depth.

Any chance you know what was happening there?

12K DRR034341.snp 93K DRR034341.fix 489K DRR034341.for 501K DRR034341.ppe 342M DRR034341.cns 541K DRR034341.varscan 929M DRR034341.pileup 14K DRR034341.sort.bam.bai 127M DRR034341.sort.bam 169M DRR034341.merge.bam 2.0M DRR034341.single.bam 168M DRR034341.paired.bam 529M DRR034341.paired.sam 5.2M DRR034341.single.sam 334K DRR034341_S.sai 15M DRR034341_R2.sai 15M DRR034341_R1.sai 222M DRR034341_tr_1.fq 202M DRR034341_tr_2.fq 4.3M DRR034341_tr_S.fq 2.0K run.sh 1.9K mapping.sh 90M DRR034341_1.fastq.gz 97M DRR034341_2.fastq.gz 4.0K DRR034341

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 5, 2023, at 10:28, Dave @.**@.>> wrote:

Hi Dave, It’s my pleasure! Would you "ls -lth” all the files you generated and send that output to me (including fastq)? Best, Qingyun

Dear Qingyun,

Thank you for helping me solve the problem in your busy schedule.

I have listed all the files in my folder and put them in the following file. Please check it. file_list.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10353353_file-5Flist.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=FQpwQ7-trMB9icp2ipmgQbHGni5TMOk3TRyRm6MiPNHhY7InBrI4r0DJ4SAh3N3Q&s=c3-9dQnIqAUKhBujy9WtJeIYHlDeIQVRccls6T5r-1Q&e=

Your,

Dave

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1-23issuecomment-2D1372365065&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=FQpwQ7-trMB9icp2ipmgQbHGni5TMOk3TRyRm6MiPNHhY7InBrI4r0DJ4SAh3N3Q&s=cDCnDRabMx_3XfRBYmT6lNjl18QzrOLL6rn07RDwF9w&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOUFDOSVFJ26MPII3QLWQ3SBJANCNFSM6AAAAAATRO4454&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=FQpwQ7-trMB9icp2ipmgQbHGni5TMOk3TRyRm6MiPNHhY7InBrI4r0DJ4SAh3N3Q&s=wlujrEVuXTvRIxsyFkUw_kJ18jw25cPyBpQFo2iymjA&e=. You are receiving this because you commented.Message ID: @.***>

Dx-wmc commented 1 year ago

Hi Dave, I just reran this isolate’s data, and got results below. Seems like your fileup file size is much smaller than mine, might explain why you got lower depth. Any chance you know what was happening there? 12K DRR034341.snp 93K DRR034341.fix 489K DRR034341.for 501K DRR034341.ppe 342M DRR034341.cns 541K DRR034341.varscan 929M DRR034341.pileup 14K DRR034341.sort.bam.bai 127M DRR034341.sort.bam 169M DRR034341.merge.bam 2.0M DRR034341.single.bam 168M DRR034341.paired.bam 529M DRR034341.paired.sam 5.2M DRR034341.single.sam 334K DRR034341_S.sai 15M DRR034341_R2.sai 15M DRR034341_R1.sai 222M DRR034341_tr_1.fq 202M DRR034341_tr_2.fq 4.3M DRR034341_tr_S.fq 2.0K run.sh 1.9K mapping.sh 90M DRR034341_1.fastq.gz 97M DRR034341_2.fastq.gz 4.0K DRR034341 Best, Qingyun

Dear Qingyun,

Thank you for taking the time to run the code and troubleshoot for me.

I re-compared the scripts and found a place that was somewhat different, in the two scripts of Unfix and Fix, although they are almost the same, there is a line of parameters that are not the same. In the Fix bash script, the parameter is -ABOf samtools mpileup -q 30 -Q 20 -ABOf ~/database/mtb/template/bwa/tb.ancestor.fasta ${Sample}.sort.bam > ${Sample}.pileup, while in Unfix script, samtools mpileup -q 30 -Q 20 -BOf ~/database/mtb/template/bwa/tb.ancestor.fasta ${Sample}.sort.bam > ${Sample}.pileup, the parameter is reduced by one A.

I ran the code of Fix_SNPs, and the size of pileup file is the same as your. Should the parameter in the Unfix script be the "-ABOf"?

However, the mixforkept file is still empty when I continue to run with the next steps of the unfix script. The size of the file and the scripts are listed below.

341 MB DRR034341.cns 20 B DRR034341.dep 91 KB DRR034341.fix 456 KB DRR034341.for 1.5 MB DRR034341.forup 168 MB DRR034341.merge.bam 11 KB DRR034341.mix 9.1 KB DRR034341.mixfor 9.3 KB DRR034341.mixfordisc 0 B DRR034341.mixforkept 166 MB DRR034341.paired.bam 529 MB DRR034341.paired.sam 920 MB DRR034341.pileup 470 KB DRR034341.ppe 1.9 MB DRR034341.single.bam 5.1 MB DRR034341.single.sam 12 KB DRR034341.snp 126 MB DRR034341.sort.bam 14 KB DRR034341.sort.bam.bai 530 KB DRR034341.varscan 15 MB DRR034341_R1.sai 15 MB DRR034341_R2.sai 334 KB DRR034341_S.sai 222 MB DRR034341_tr_1.fq 201 MB DRR034341_tr_2.fq 4.3 MB DRR034341_tr_S.fq 3.0 KB unfix.sh unfix.sh.txt

If you run all the scripts and get the new results, can you send me the scripts you run? I'd like to compare the difference between the two. Also let us know the version of bwa and samtools you use.

Once again, I express my sincere gratitude to you.

Yours, Dave

MtbEvolution commented 1 year ago

Dear Dave,

Thanks for comparing that and so happy you find out the cause! Good job!

Yes! Do include “-A”, we don’t want to lose those orphan reads! (-A, --count-orphans Do not skip anomalous read pairs in variant calling. Anomalous read pairs are those marked in the FLAG field as paired in sequencing but without the properly-paired flag set.)

I will unify the scripts on Github shortly!

Please don’t hesitate to contact me if you have further questions or troubles. I am also happy to discuss questions in your project.

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 5, 2023, at 11:44, Dave @.**@.>> wrote:

Hi Dave, I just reran this isolate’s data, and got results below. Seems like your fileup file size is much smaller than mine, might explain why you got lower depth. Any chance you know what was happening there? 12K DRR034341.snp 93K DRR034341.fix 489K DRR034341.for 501K DRR034341.ppe 342M DRR034341.cns 541K DRR034341.varscan 929M DRR034341.pileup 14K DRR034341.sort.bam.bai 127M DRR034341.sort.bam 169M DRR034341.merge.bam 2.0M DRR034341.single.bam 168M DRR034341.paired.bam 529M DRR034341.paired.sam 5.2M DRR034341.single.sam 334K DRR034341_S.sai 15M DRR034341_R2.sai 15M DRR034341_R1.sai 222M DRR034341_tr_1.fq 202M DRR034341_tr_2.fq 4.3M DRR034341_tr_S.fq 2.0K run.sh 1.9K mapping.sh 90M DRR034341_1.fastq.gz 97M DRR034341_2.fastq.gz 4.0K DRR034341 Best, Qingyun

Dear Qingyun,

Thank you for taking the time to run the code and troubleshoot for me.

I re-compared the scripts and found a place that was somewhat different, in the two scripts of Unfix and Fix, although they are almost the same, there is a line of parameters that are not the same. In the Fix bash script, the parameter is -ABOf samtools mpileup -q 30 -Q 20 -ABOf ~/database/mtb/template/bwa/tb.ancestor.fasta ${Sample}.sort.bam > ${Sample}.pileup, while in Unfix script, samtools mpileup -q 30 -Q 20 -BOf ~/database/mtb/template/bwa/tb.ancestor.fasta ${Sample}.sort.bam > ${Sample}.pileup, the parameter is reduced by one A.

I ran the code of Fix_SNPs, and the size of pileup file is the same as your. Should the parameter in the Unfix script be the "-ABOf"?

However, the mixforkept file is still empty when I continue to run with the next steps of the unfix script. The size of the file and the scripts are listed below.

341 MB DRR034341.cns 20 B DRR034341.dep 91 KB DRR034341.fix 456 KB DRR034341.for 1.5 MB DRR034341.forup 168 MB DRR034341.merge.bam 11 KB DRR034341.mix 9.1 KB DRR034341.mixfor 9.3 KB DRR034341.mixfordisc 0 B DRR034341.mixforkept 166 MB DRR034341.paired.bam 529 MB DRR034341.paired.sam 920 MB DRR034341.pileup 470 KB DRR034341.ppe 1.9 MB DRR034341.single.bam 5.1 MB DRR034341.single.sam 12 KB DRR034341.snp 126 MB DRR034341.sort.bam 14 KB DRR034341.sort.bam.bai 530 KB DRR034341.varscan 15 MB DRR034341_R1.sai 15 MB DRR034341_R2.sai 334 KB DRR034341_S.sai 222 MB DRR034341_tr_1.fq 201 MB DRR034341_tr_2.fq 4.3 MB DRR034341_tr_S.fq 3.0 KB unfix.sh unfix.sh.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10353949_unfix.sh.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=U8_X--MZzAx6ilsmCWmlPsVn1Ix4rHE33d7QT-nai9cCNb4x834O_eNBVsI-p-D8&s=OB9Gmnr6qePGOAV2bn1KBRliftE96caV11iLJTyCaL0&e=

If you run all the scripts and get the new results, can you send me the script you run? I'd like to compare the difference between the two.

Once again, I express my sincere gratitude to you.

Yours, Dave

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1-23issuecomment-2D1372461672&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=U8_X--MZzAx6ilsmCWmlPsVn1Ix4rHE33d7QT-nai9cCNb4x834O_eNBVsI-p-D8&s=ZobEmOyVIfNGBOoDaOfEB0_2MoWGP2f-5ydSFXXa_yo&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOS5K4OFA536JY4H4UDWQ33ADANCNFSM6AAAAAATRO4454&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=U8_X--MZzAx6ilsmCWmlPsVn1Ix4rHE33d7QT-nai9cCNb4x834O_eNBVsI-p-D8&s=HxOmSx8WdBFDPt1tL1U6EGSiaVaTTut4w4UPktKWo5o&e=. You are receiving this because you commented.Message ID: @.***>

Dx-wmc commented 1 year ago

Dear Dave, Thanks for comparing that and so happy you find out the cause! Good job! Yes! Do include “-A”, we don’t want to lose those orphan reads! (-A, --count-orphans Do not skip anomalous read pairs in variant calling. Anomalous read pairs are those marked in the FLAG field as paired in sequencing but without the properly-paired flag set.) I will unify the scripts on Github shortly! Please don’t hesitate to contact me if you have further questions or troubles. I am also happy to discuss questions in your project. Best, Qingyun

Dear Qingyun,

Thank you very much for your answer to my parameter question.

However, I still encountered a problem when running the rest of the script (parts after varscan2). Specifically, when I used this command perl ~/script/redepin_filt.pl ~/script/Excluded_loci_mask.list DRR034341.dep DRR034341.mixfor, I did not generate a markkept file with content, but all the SNP sites were output to the markdisc file, which is clearly not right. I still do not know where the problem occurred.

Could you help me solve this problem? I hope to receive your reply in your spare time.

Yours,

Dave

MtbEvolution commented 1 year ago

Dear Dave,

Sure thing! Would you “ls -lth” these three files for me?

Excluded_loci_mask.list $dep $mixfor

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 5, 2023, at 19:12, Dave @.**@.>> wrote:

Dear Dave, Thanks for comparing that and so happy you find out the cause! Good job! Yes! Do include “-A”, we don’t want to lose those orphan reads! (-A, --count-orphans Do not skip anomalous read pairs in variant calling. Anomalous read pairs are those marked in the FLAG field as paired in sequencing but without the properly-paired flag set.) I will unify the scripts on Github shortly! Please don’t hesitate to contact me if you have further questions or troubles. I am also happy to discuss questions in your project. Best, Qingyun

Dear Qingyun,

Thank you very much for your answer to my parameter question.

However, I still encountered a problem when running the rest of the script (parts after varscan2). Specifically, when I used this command perl ~/script/redepin_filt.pl ~/script/Excluded_loci_mask.list DRR034341.dep DRR034341.mixfor, I did not generate a markkept file with content, but all the SNP sites were output to the discard file, which is clearly not right. I still do not know where the problem occurred.

Could you help me solve this problem? I hope to receive your reply in your spare time.

Yours,

Dave

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1-23issuecomment-2D1372958772&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=4N9dIM47tSYRMNKFo-jUU4hjVPvxK3XRLB2hrHXdXt88VLbh55tUICyMtIIPwqiy&s=NGNsaluL879ivJheI0YQPL_d0oMtDGkD86Q1BQRO8TY&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOS7EBEK7JIK6A74TWDWQ5PNVANCNFSM6AAAAAATRO4454&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=4N9dIM47tSYRMNKFo-jUU4hjVPvxK3XRLB2hrHXdXt88VLbh55tUICyMtIIPwqiy&s=kRF1xb9o1MZ_ODbdl2E-x06b0ERcN_yEhjXQsBDgIzo&e=. You are receiving this because you commented.Message ID: @.***>

Dx-wmc commented 1 year ago

Dear Dave, Sure thing! Would you “ls -lth” these three files for me? Excluded_loci_mask.list $dep $mixfor Qingyun

Dear Qingyun,

The size of these three files I put below and uploaded

9.1KB DRR034341.mixfor DRR034341.mixfor.txt

20B DRR034341.dep DRR034341.dep.txt

3.4MB Excluded_loci_mask.list Excluded_loci_mask.list.txt

Yours,

Dave

MtbEvolution commented 1 year ago

Dear Dave,

I found the error, the command line “perl ~/script/info_mark.pl $mixfor > $mixmark” was missing in the script.

I have updated this command in the "Update 0_Unfixed_SNPs_Calling.shhttps://github.com/MtbEvolution/resR_Project/commit/77ba30e8a60d0677130bd43c8cf9c33785c8f91c" and uploaded the “info_mark.pl”. Now it should work!

Thank you for notifying this! Really appreciate that! Let me know if there is any that question!

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 5, 2023, at 20:10, Dave @.**@.>> wrote:

Dear Dave, Sure thing! Would you “ls -lth” these three files for me? Excluded_loci_mask.list $dep $mixfor Qingyun

Dear Qingyun,

The size of these three files I put below and uploaded

9.1KB DRR034341.mixfor DRR034341.mixfor.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10356617_DRR034341.mixfor.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=Q5makF3NCqCMQCEm-wjlf_UuaHF_c97Uaxm6jolEz1t2UDuxf0a3MYCTNKGGTj81&s=kTrVriizIj2p1wH7b99QjKUaxyZmQIT9jkGHYaA9w6M&e=

20B DRR034341.dep DRR034341.dep.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10356618_DRR034341.dep.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=Q5makF3NCqCMQCEm-wjlf_UuaHF_c97Uaxm6jolEz1t2UDuxf0a3MYCTNKGGTj81&s=rAriaBUnigwWVp7hXO3TlCADihgxQkW6c5o1maQ37YQ&e=

3.4MB Excluded_loci_mask.list Excluded_loci_mask.list.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10356621_Excluded-5Floci-5Fmask.list.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=Q5makF3NCqCMQCEm-wjlf_UuaHF_c97Uaxm6jolEz1t2UDuxf0a3MYCTNKGGTj81&s=hofsP_noCf6KgWTFNL9RAK5d4J4lElUh17hvATVdI_Q&e=

Yours,

Dave

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1-23issuecomment-2D1373004812&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=Q5makF3NCqCMQCEm-wjlf_UuaHF_c97Uaxm6jolEz1t2UDuxf0a3MYCTNKGGTj81&s=gYaKOYVudyXez5IU0WaO4x7_LjefOU0Kv3Hup_AhWPQ&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOVJDQPRVE7VXRRRTRTWQ5WKDANCNFSM6AAAAAATRO4454&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=Q5makF3NCqCMQCEm-wjlf_UuaHF_c97Uaxm6jolEz1t2UDuxf0a3MYCTNKGGTj81&s=3idbgETSn_rbsL6qJMtcYZAUt9L0cBSOPD6xnwgCF0I&e=. You are receiving this because you commented.Message ID: @.***>

MtbEvolution commented 1 year ago

In addition to that, I want to highlight that my analyzing pipeline can be used as a first-screening for unfixed SNPs. For those unfixed SNPs you find interesting in biology, please do remember to check them by inspecting the mapping landscape in bam files and see if something weird would show up. This is how I prioritize hits to follow up.

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 5, 2023, at 20:39, Qingyun Liu @.**@.>> wrote:

Dear Dave,

I found the error, the command line “perl ~/script/info_mark.pl $mixfor > $mixmark” was missing in the script.

I have updated this command in the "Update 0_Unfixed_SNPs_Calling.shhttps://github.com/MtbEvolution/resR_Project/commit/77ba30e8a60d0677130bd43c8cf9c33785c8f91c" and uploaded the “info_mark.pl”. Now it should work!

Thank you for notifying this! Really appreciate that! Let me know if there is any that question!

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 5, 2023, at 20:10, Dave @.**@.>> wrote:

Dear Dave, Sure thing! Would you “ls -lth” these three files for me? Excluded_loci_mask.list $dep $mixfor Qingyun

Dear Qingyun,

The size of these three files I put below and uploaded

9.1KB DRR034341.mixfor DRR034341.mixfor.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10356617_DRR034341.mixfor.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=Q5makF3NCqCMQCEm-wjlf_UuaHF_c97Uaxm6jolEz1t2UDuxf0a3MYCTNKGGTj81&s=kTrVriizIj2p1wH7b99QjKUaxyZmQIT9jkGHYaA9w6M&e=

20B DRR034341.dep DRR034341.dep.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10356618_DRR034341.dep.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=Q5makF3NCqCMQCEm-wjlf_UuaHF_c97Uaxm6jolEz1t2UDuxf0a3MYCTNKGGTj81&s=rAriaBUnigwWVp7hXO3TlCADihgxQkW6c5o1maQ37YQ&e=

3.4MB Excluded_loci_mask.list Excluded_loci_mask.list.txthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_files_10356621_Excluded-5Floci-5Fmask.list.txt&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=Q5makF3NCqCMQCEm-wjlf_UuaHF_c97Uaxm6jolEz1t2UDuxf0a3MYCTNKGGTj81&s=hofsP_noCf6KgWTFNL9RAK5d4J4lElUh17hvATVdI_Q&e=

Yours,

Dave

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1-23issuecomment-2D1373004812&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=Q5makF3NCqCMQCEm-wjlf_UuaHF_c97Uaxm6jolEz1t2UDuxf0a3MYCTNKGGTj81&s=gYaKOYVudyXez5IU0WaO4x7_LjefOU0Kv3Hup_AhWPQ&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOVJDQPRVE7VXRRRTRTWQ5WKDANCNFSM6AAAAAATRO4454&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=Q5makF3NCqCMQCEm-wjlf_UuaHF_c97Uaxm6jolEz1t2UDuxf0a3MYCTNKGGTj81&s=3idbgETSn_rbsL6qJMtcYZAUt9L0cBSOPD6xnwgCF0I&e=. You are receiving this because you commented.Message ID: @.***>

Dx-wmc commented 1 year ago

Dear Dave, I found the error, the command line “perl ~/script/info_mark.pl $mixfor > $mixmark” was missing in the script. I have updated this command in the "Update 0_Unfixed_SNPs_Calling.sh<77ba30e>" and uploaded the “info_mark.pl”. Now it should work! Thank you for notifying this! Really appreciate that! Let me know if there is any that question! Best, Qingyun

Dear Qingyun,

I used your latest uploaded script and this time I got the correct result!

Thank you very much for your efforts to solve my problem!

In addition to expressing my gratitude, I am also wondering about some small questions about SNP calling:

  1. The reference genome used by the script is tb.ancestor, is this analysis method also applicable to h37rv?
  2. If 1 is applicable, when should I use h37rv and when should I use tb.ancestor?
  3. The removal of PPE, repeat regions, etc. is different between the Fixed SNPs script and Unfixed SNPs script. What are the differences between the two methods?

As my knowledge of mtbc is still very limited, the above questions may not be very professional, I hope you could understand me.

Yours,

Dave

MtbEvolution commented 1 year ago

Dear Dave,

I am so happy you have got the results!!! This is such a pleasure troubleshooting with you!

Of course happy to answer other questions as well:

  1. Basically, this ancestor template is built on H37Rv backbone, so it should apply to Rv, but I haven’t tested that.
  2. I will say ancestor will be a good start because you don’t want to get those mutations that are specific to Rv, right?
  3. This is a good question, for fixed SNPs, the criteria is quite loose in the beginning because we need to check the interested mutations through other methods in the end (the cns file, the pileup file, or PCR ideally). Please see my paper (Liu, Sci Rep, 2015), I used PCR and sanger sequencing to validate those interested mutations.

Always happy to answer questions and please don’t hesitate to reach out!

Best,

Qingyun

-- Qingyun Liu, PhD Department of Immunology and Infectious Diseases | Sarah Fortune Lab Harvard T.H. Chan School of Public Health 665 Huntington Ave., SPH 1, Room 815 | Boston, MA 02115

On Jan 5, 2023, at 21:31, Dave @.**@.>> wrote:

Dear Dave, I found the error, the command line “perl ~/script/info_mark.pl $mixfor > $mixmark” was missing in the script. I have updated this command in the "Update 0_Unfixed_SNPs_Calling.sh<77ba30ehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_commit_77ba30e8a60d0677130bd43c8cf9c33785c8f91c&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=We42vcyn4Ff2Agv1Ecz_g5p6BkRgrl-satSSmLkinGnFLF72epAk21SHulBxv2gn&s=FqPwbo9yH9UWHnz8kt3D3xzzTn9wsDyosLKP2r6_JBM&e=>" and uploaded the “info_mark.pl”. Now it should work! Thank you for notifying this! Really appreciate that! Let me know if there is any that question! Best, Qingyun

Dear Qingyun,

I used your latest uploaded script and this time I got the correct result!

Thank you very much for your efforts to solve my problem!

In addition to expressing my gratitude, I am also wondering about some small questions about SNP calling:

  1. The reference genome used by the script is tb.ancestor, is this analysis method also applicable to h37rv?
  2. If 1 is applicable, when should I use h37rv and when should I use tb.ancestor?
  3. The removal of PPE, repeat regions, etc. is different between the Fixed SNPs script and Unfixed SNPs script. What are the differences between the two methods?

As my knowledge of mtbc is still very limited, the above questions may not be very professional, I hope you could understand me.

Yours,

Dave

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MtbEvolution_resR-5FProject_issues_1-23issuecomment-2D1373062099&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=We42vcyn4Ff2Agv1Ecz_g5p6BkRgrl-satSSmLkinGnFLF72epAk21SHulBxv2gn&s=bprTWuKMDsqXCBAIaZaP-QtZ-SvJIn06J7cCIldxb7o&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AZVZDOTKT623F7HIDU3ZQITWQ57WNANCNFSM6AAAAAATRO4454&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=cieqRHKkzl-btGUA1OVqXaalXKkUEkDM4kCuY0Pu458&m=We42vcyn4Ff2Agv1Ecz_g5p6BkRgrl-satSSmLkinGnFLF72epAk21SHulBxv2gn&s=wVEnpgHZbxxvCxvddCdwAux5_KRdDyhFzPepqNfEddY&e=. You are receiving this because you commented.Message ID: @.***>

Dx-wmc commented 1 year ago

Dear Dave, I am so happy you have got the results!!! This is such a pleasure troubleshooting with you! Of course happy to answer other questions as well: 1. Basically, this ancestor template is built on H37Rv backbone, so it should apply to Rv, but I haven’t tested that. 2. I will say ancestor will be a good start because you don’t want to get those mutations that are specific to Rv, right? 3. This is a good question, for fixed SNPs, the criteria is quite loose in the beginning because we need to check the interested mutations through other methods in the end (the cns file, the pileup file, or PCR ideally). Please see my paper (Liu, Sci Rep, 2015), I used PCR and sanger sequencing to validate those interested mutations. Always happy to answer questions and please don’t hesitate to reach out! Best, Qingyun

Dear Qingyun,

Thank you very much for your detailed answers, I have learned a lot.

You have a great deal of experience in the field of TB and your work is really excellent. I hope I can learn more from you in the future, and also hope to have more opportunities to communicate like this!

Yours,

Dave