dpuiu / MitoHPC

MIT License
10 stars 12 forks source link

Error while running Run.sh #3

Open snashraf opened 1 year ago

snashraf commented 1 year ago

Hey !!

I was trying to run the command and getting the below error.

` $HP_SDIR/run.sh > run.all.sh

Regards, Najeeb

dpuiu commented 1 year ago

Hi Najeeb,

"test -w" tests if the user has write permissions to the align directory. It looks like your directory is read only. The pipeline needs to generate .idxstats and .count files for each sample (.bam file). If giving directory write permission is not an option, you create a new dir and symlink the bam/bai files.

snashraf commented 1 year ago

Hi Daniela , Thanks! I was able to fix the issue.

I think there is some issue with gridss command . Can you please check that. I have to run the pipeline on 6000 samples and we have LSF system as scheduler. whats best way to run this pararelly?

Regards, Najeeb

On Fri, 23 Sept 2022, 20:39 Daniela Puiu, @.***> wrote:

Hi Najeeb,

"test -w" tests if the user has write permissions to the align directory. It looks like your directory is read only. The pipeline needs to generate .idxstats and .count files for each sample (.bam file). If giving directory write permission is not an option, you create a new dir and symlink the bam/bai files.

— Reply to this email directly, view it on GitHub https://github.com/dpuiu/MitoHPC/issues/3#issuecomment-1256491426, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBFIN6SZMIWCEJPLPHFYUDV7XTLPANCNFSM6AAAAAAQSGDVNY . You are receiving this because you authored the thread.Message ID: @.***>

dpuiu commented 1 year ago

Najeeb,

Have you set export HP_V=gridss in init.sh?

If so, the problem might be related to the R dependency $ which Rscript /usr/bin/Rscript

I have to update scripts/install_sysprerequisites.sh and add apt-get install -y -q r-base

################################################################################### For the large job, I would suggest running a few samples first and check the completion status.

snashraf commented 1 year ago

Hi Daniela,

I was able to run the pipeline completely. I want to know which file I can use to understand the CN ( Copy Number ) information.

Regards, Najeeb

dpuiu commented 1 year ago

Hi Najeeb,

Glad you hear you have managed to run the pipeline.

Regarding mtDNA-CN :

Final count File:

 out/count.tab               # total reads & mtDNA-CN counts

Columns:

Run                           # Sample name

all                           # number of reads in the sample reads
mapped                        # number of aligned reads in the sample
MT                            # number of reads aligned to chrM
M                             # mtDNA copy number !!!

########################################################################

For each sample: scripts/filter.sh ; line 48 ; run idxstats2count.pl

cat $I.idxstats | \ idxstats2count.pl -sample $S -chrM $HP_RMT > $I.count

reads the idxstats file and adds up the number of mapped reads to each chromome => number of mapped reads ; prints out the total number of reads, total number of mapped reads and total number of reads aligned to chrM

############################################################################

At the end, run the scripts/getSummary.sh ; line 15 ; run getCN.pl

cut -f2 $HP_IN | \ perl -ane 'print "$1.count\n" if(/(.+)./);' | xargs cat | cut -f1,2,3,4 | uniq.pl | \ getCN.pl > $ODIR/count.tab

computes the total and chrM cvg

computes mtDNA-CN as 2×(chrM coverage)/(genome coverage)

Daniela Puiu

Bioinformatics Engineer

Department of Biomedical Engineering

Johns Hopkins University


From: Najeeb Ashraf Syed @.> Sent: Wednesday, September 28, 2022 7:18 AM To: dpuiu/MitoHPC @.> Cc: Daniela Puiu @.>; Comment @.> Subject: Re: [dpuiu/MitoHPC] Error while running Run.sh (Issue #3)

  External Email - Use Caution

Hi Daniela,

I was able to run the pipeline completely. I want to know which file I can use to understand the CN ( Copy Number ) information.

Regards, Najeeb

— Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdpuiu%2FMitoHPC%2Fissues%2F3%23issuecomment-1260757213&data=05%7C01%7Cdpuiu%40jhu.edu%7Cc463b71b4ab646ef26fb08daa1432e34%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637999607168204572%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LVTOeMvjxFw8qnes1uZkFkNfQQt2u1%2FXTYn6qRv9Ua4%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAHXHM3M5HIBPRIJV4HQNYDWAQSQRANCNFSM6AAAAAAQSGDVNY&data=05%7C01%7Cdpuiu%40jhu.edu%7Cc463b71b4ab646ef26fb08daa1432e34%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C637999607168204572%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FcLGeZkUNqLc58BO3BymhfS4bTTaErs99%2BE%2F0z2Wi88%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

snashraf commented 1 year ago

Thanks, Daniela,

I have run the analysis, and these are the files generated by the MitoHPC. I just want to understand what each file contains. Why some files named as ".mutect2.mutect2"?

SI000020000010_02000865410.mutect2.00.vcf SI000020000010_02000865410.mutect2.fix.vcf SI000020000010_02000865410.mutect2.max.vcf SI000020000010_02000865410.mutect2.mutect2.00.vcf SI000020000010_02000865410.mutect2.mutect2.fix.vcf SI000020000010_02000865410.mutect2.mutect2.orig.vcf SI000020000010_02000865410.mutect2.mutect2.vcf SI000020000010_02000865410.mutect2.orig.vcf SI000020000010_02000865410.mutect2.vcf

dpuiu commented 1 year ago

By default MitoHPC runs 2 iteration. The 1st iteration output files are called sample.mutect2. whjile the 2nd iteration output files are called sample.mutect2.mutect2. In the 1st iteration , the rCRS sequence (chrM) is used as reference for alignment/SNV calling. For each sample, a new consensus sequence is generated , which is then used as reference in the 2nd iteration. Using the sample consensus should be better for accurate heteroplasmy calling. The 2nd iteration detects only heteroplamic SNVs. However, as output, we readjust the coordinates relative to rCRS and merge them withe the homoplasmies detected in the 1st iteration.

snashraf commented 1 year ago

So if I want to have a final file containing both heteroplasmi and homoplasmi than file I should use? And same question also for mutserve?

dpuiu commented 1 year ago

Both SI000020000010_02000865410.mutect2.00.vcf and SI000020000010_02000865410.mutect2.mutect2.00.vcf will contain both homoplasmies and heteroplamies . However SI000020000010_02000865410.mutect2.mutect2.00.vcf should be more accurate,
Regarding mutserve, unfortunately it does not allow for a custom consensus sequence. If HP_M="mutserve", you need to set "HP_I=1"