Open mihinduk opened 5 years ago
I have solved the problem for PE data by setting -T 2 and requesting more memory based on the stats generated by running /usr/bin/time -v . For SE data, if I run the identical command on a local server it runs, but when I try to run on a remote server using Docker, I get the following error:
Command: /usr/bin/time -v DCC @/gscmnt/gc2645/wgs/km_test/dcc/msbb/02.-ProcessedData/06.-circRNA/Hg19/DCC/DCC_InputFiles/BM36/samplesheet -mt1 @/gscmnt/gc2645/wgs/km_test/dcc/msbb/02.-ProcessedData/06.-circRNA/Hg19/DCC/DCC_InputFiles/BM36/read -T 2 -D -N -R /gscmnt/gc2645/wgs/resources/RNAseq/genome/hg19_Repeats_RepeatMasker_SimpleRepeats.gtf -an /gscmnt/gc2645/wgs/resources/RNAseq/genome/gencode.v19.annotation.spike-in.gtf -F -M -Nr 1 1 -fg -k -G -A /gscmnt/gc2645/wgs/resources/RNAseq/genome/GRCh37.p13.genome.lite.spike-in.fa -B @/gscmnt/gc2645/wgs/km_test/dcc/msbb/02.-ProcessedData/06.-circRNA/Hg19/DCC/DCC_InputFiles/BM36/bam_files -O /gscmnt/gc2645/wgs/km_test/dcc/msbb/02.-ProcessedData/06.-circRNA/Hg19/DCC/BM36 -t /gscmnt/gc2645/wgs/km_test/dcc/msbb/02.-ProcessedData/06.-circRNA/Hg19/DCC/DCC_InputFiles/BM36/_tmp_DCC
Error:
Traceback (most recent call last):
File "/usr/local/bin/DCC", line 11, in
I met the same issue, Do you have any ideas to solve this?
I would try increasing the amount of memory you request:
Dataset | tissue | samples | Cores | time | Max mem MSBB | BM10 | 325 | 2 | 56:43:04 | 282082312 MSBB | BM22 | 334 | 4 | 44:50:42 | 313739268 MSBB | BM36 | 315 | 4 | 37:54:52 | 296575084 MSBB | BM44 | 308 | 4 | 40:16:14 | 286656860
Thanks For your reply! Is this the solution for first issue or second issue?
Hi @mihinduk, hi @JunmingH,
thank you for reporting the issues and for your patience.
increasing the memory should fix issue 1, since here the error message refers to a bad memory address: IOError: [Errno 14] Bad address
. So requesting more memory on cluster scheduled environments will solve that issue.
The second issue looks familiar. Could it be, that you forgot to specify -mt2?
Cheers, Tobias
The second error is on SE data. It takes more memory and time than the PE data. I was able to get this to run locally, but not yet on a server, using a Dockerfile.
From: Tobias Jakobi notifications@github.com Sent: Tuesday, November 19, 2019 9:24 AM To: dieterich-lab/DCC DCC@noreply.github.com Cc: Mihindukulasuriya, Kathie mihindu@wustl.edu; Mention mention@noreply.github.com Subject: Re: [dieterich-lab/DCC] Combining individual circRNA read counts - error (#68)
Hi @mihindukhttps://github.com/mihinduk, hi @JunmingHhttps://github.com/JunmingH,
increasing the memory should fix issue 1, since here the error message refers to a bad memory address: IOError: [Errno 14] Bad address. So requesting more memory on cluster scheduled environments will solve that issue.
The second issue looks familiar. Could it be, that you forgot to specify -mt2?
Cheers, Tobias
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dieterich-lab/DCC/issues/68?email_source=notifications&email_token=ANDVLDIZIALZTVRSZ66ERKLQUQAKDA5CNFSM4IXHJYDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEOSELA#issuecomment-555557420, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANDVLDI6VD3BH7RE4JNUVXDQUQAKDANCNFSM4IXHJYDA.
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.
Hi @mihinduk,
for SE data you must not use -mt1
. -mt1
and -mt2
are reserved for PE setups.
Cheers, Tobias
Hi Tobias,
Do you have any idea why this would run locally: /usr/bin/time -v DCC @/40/Public_Data/bulkRNASeq/201812_MSBB/Gene_Expression/02.-ProcessedData/06.-circRNA/Hg19/BM44/DCC/DCC_InputFiles/samplesheet -mt1 @/40/Public_Data/bulkRNASeq/201812_MSBB/Gene_Expression/02.-ProcessedData/06.-circRNA/Hg19/BM44/DCC/DCC_InputFiles/read -T 4 -D -N -R /40/pipelines/RNAseq/circRNA/hg19_Repeats_RepeatMasker_SimpleRepeats.gtf -an /40/pipelines/RNAseq/circRNA/Hg19_gencodev19_spikein/gencode.v19.annotation.spike-in.gtf -F -M -Nr 1 1 -fg -k -G -A /40/pipelines/RNAseq/circRNA/Hg19_gencodev19_spikein/GRCh37.p13.genome.lite.spike-in.fa -B @/40/Public_Data/bulkRNASeq/201812_MSBB/Gene_Expression/02.-ProcessedData/06.-circRNA/Hg19/BM44/DCC/DCC_InputFiles/bam_files -O /40/Public_Data/bulkRNASeq/201812_MSBB/Gene_Expression/02.-ProcessedData/06.-circRNA/Hg19/BM44/DCC/ -t /40/Public_Data/bulkRNASeq/201812_MSBB/Gene_Expression/02.-ProcessedData/06.-circRNA/Hg19/BM44/DCC/_tmp_DCC
@tjakobi Hi Tobias, I am also run SE data without specify the -mt1 and -mt2. But still have the same error message. I try to integrate all the files with the location information in one files to run it. But same error.
@tjakobi Hi Tobias,
I have a question about combine the results. Since I could not run all files at same time. Therefore, I run it one by one and store in different directory. I was wondering how could I combine them together? since each subject have different results. How can I treat the missing circRNA?
Thanks!
Hi Tobias,
Do you have any idea why this would run locally: /usr/bin/time -v DCC @/40/Public_Data/bulkRNASeq/201812_MSBB/Gene_Expression/02.-ProcessedData/06.-circRNA/Hg19/BM44/DCC/DCC_InputFiles/samplesheet -mt1 @/40/Public_Data/bulkRNASeq/201812_MSBB/Gene_Expression/02.-ProcessedData/06.-circRNA/Hg19/BM44/DCC/DCC_InputFiles/read -T 4 -D -N -R /40/pipelines/RNAseq/circRNA/hg19_Repeats_RepeatMasker_SimpleRepeats.gtf -an /40/pipelines/RNAseq/circRNA/Hg19_gencodev19_spikein/gencode.v19.annotation.spike-in.gtf -F -M -Nr 1 1 -fg -k -G -A /40/pipelines/RNAseq/circRNA/Hg19_gencodev19_spikein/GRCh37.p13.genome.lite.spike-in.fa -B @/40/Public_Data/bulkRNASeq/201812_MSBB/Gene_Expression/02.-ProcessedData/06.-circRNA/Hg19/BM44/DCC/DCC_InputFiles/bam_files -O /40/Public_Data/bulkRNASeq/201812_MSBB/Gene_Expression/02.-ProcessedData/06.-circRNA/Hg19/BM44/DCC/ -t /40/Public_Data/bulkRNASeq/201812_MSBB/Gene_Expression/02.-ProcessedData/06.-circRNA/Hg19/BM44/DCC/_tmp_DCC
I am not sure how DCC handles the case where only mate1 is supplied like in your example. Generally it's either both , -mt1
AND -mt2
or none of both, the I am pretty sure your command with only -mt1
is not behavign correctly. Anyway, this needs to be addressed in the code before DCC starts running.
@tjakobi Hi Tobias, I am also run SE data without specify the -mt1 and -mt2. But still have the same error message. I try to integrate all the files with the location information in one files to run it. But same error.
Hi @JunmingH,
are you referring to the
remove_empty_lines
TypeError: 'NoneType' object is not iterable
Command exited with non-zero status 1
Error?
Cheers, Tobias
@tjakobi Hi Tobias,
I have a question about combine the results. Since I could not run all files at same time. Therefore, I run it one by one and store in different directory. I was wondering how could I combine them together? since each subject have different results. How can I treat the missing circRNA?
Thanks!
Hi @JunmingH,
please see my response in your new issue: https://github.com/dieterich-lab/DCC/issues/72#issuecomment-557919252
Cheers, Tobias
@tjakobi Yes same error with this remove_empty_lines TypeError: 'NoneType' object is not iterable Command exited with non-zero status 1
Hi @JunmingH,
could you please make sure that your BAM input list file does not contain any empty lines?
Also: I would like to see your complete DCC call. Did you use @filename
for specifying the input list?
Cheers, Tobias
@tjakobi Sure
python2 ${app_dir}/main.py @samplesheet \ -D -N -R ${gtf_dir}/GRCh38_Repeats_simpleRepeats_RepeatMasker.gtf \ -an ref/GRCh38/annotation/Homo_sapiens.GRCh38.95.gtf \ -F -M -Nr 1 1 -fg -G -A ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa\ -T 2 -O /dcc_all_results/ \ -B @bam_files
The format for the sample sheet and bam_files is like this: /align/subject1.sort.coord.combined_Chimeric.out.junction /align/subject2.sort.coord.combined_Chimeric.out.junction /align/subject3.sort.coord.combined_Chimeric.out.junction
/align/subject1.sort.coord.combined_Aligned.sortedByCoord.out.bam /align/subject2.sort.coord.combined_Aligned.sortedByCoord.out.bam /align/subject3.sort.coord.combined_Aligned.sortedByCoord.out.bam
Hi @JunmingH,
looks good. Could you please attach the original bam_files
and samplesheet
files?
Cheers, Tobias
Could you give me an email address? I can send it to you!
You can directly upload files here on GitHub via the area under the text field ("Attach files by dragging...")
Describe the bug DCC quits when trying to combine individual circRNA read counts. This is only when I run it using a docker in an interactive queue: bsub -Is -q research-hpc \ -a 'docker(buddej/dcc:0.1.3)' \ /bin/bash
I have run this locally without issue and am trying to run it on a larger server to run datasets that demand too much memory for my local machine (although I am testing on a small dataset of 83 samples)
Since I have to convert the wrapper I wrote from python3 to python2.17.16 to be compatible with DCC, I have isolated the actual DCC command (The previous steps of generating the infiles worked) and have been just running this inside the interactive queue:
To Reproduce Steps to reproduce the behavior:
Command line used for the command: DCC @/gscmnt/gc2645/wgs/km_test/dcc/gtex/02.-ProcessedData/06.-circRNA/Hg19/DCC/DCC_InputFiles/Amygdala/samplesheet -mt1 @/gscmnt/gc2645/wgs/km_test/dcc/gtex/02.-ProcessedData/06.-circRNA/Hg19/DCC/DCC_InputFiles/Amygdala/mate1 -mt2 @/gscmnt/gc2645/wgs/km_test/dcc/gtex/02.-ProcessedData/06.-circRNA/Hg19/DCC/DCC_InputFiles/Amygdala/mate2 -T 20 -D -R /gscmnt/gc2645/wgs/resources/RNAseq/genome/hg19_Repeats_RepeatMasker_SimpleRepeats.gtf -an /gscmnt/gc2645/wgs/resources/RNAseq/genome/gencode.v19.annotation.spike-in.gtf -Pi -F -M -Nr 1 1 -fg -k -G -A /gscmnt/gc2645/wgs/resources/RNAseq/genome/GRCh37.p13.genome.lite.spike-in.fa -B @/gscmnt/gc2645/wgs/km_test/dcc/gtex/02.-ProcessedData/06.-circRNA/Hg19/DCC/DCC_InputFiles/Amygdala/bam_files
Complete error message finished circRNA detection from file _tmp_DCC/SRR818418_unified.Chimeric.out.junction.7MIX0L Combining individual circRNA read counts Traceback (most recent call last): File "/usr/local/bin/DCC", line 11, in
load_entry_point('DCC==0.4.7', 'console_scripts', 'DCC')()
File "/usr/local/lib/python2.7/site-packages/DCC-0.4.7-py2.7.egg/DCC/main.py", line 287, in main
File "/usr/local/lib/python2.7/site-packages/DCC-0.4.7-py2.7.egg/DCC/circAnnotate.py", line 26, in selectGeneGtf
File "/usr/local/lib/python2.7/site-packages/HTSeq/init.py", line 197, in iter
for line in FileOrSequence.iter(self):
File "/usr/local/lib/python2.7/site-packages/HTSeq/init.py", line 50, in iter
for line in lines:
IOError: [Errno 14] Bad address
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
python = python2.17.16 Version = DCC 0.4.7 Dockerfile = buddej/dcc:0.1.3 https://github.com/buddej/mgi-hpc/blob/master/dcc/Dockerfile**
Any advice you can give would be greatly appreciated.
Thank you, Kathie Mihindukulasuriya