bioinfo-pf-curie / TMB

Tumor Mutational Burden
Other
49 stars 15 forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'pyeffg.intersect.bed' #5

Open deb0612 opened 3 years ago

deb0612 commented 3 years ago

Dear sir, When I use pyEffGenomeSize.py to calculate the effect size, there came a error:

$ python3 bin/pyEffGenomeSize.py --bed Roche_KAPA_HyperExome_hg38_capture_targets.bed --gtf gencode.v37.annotation.gtf

Traceback (most recent call last): File "bin/pyEffGenomeSize.py", line 207, in getEffGenomeSizeFromMosdepth(args.oprefix +".intersect.bed") File "bin/pyEffGenomeSize.py", line 89, in getEffGenomeSizeFromMosdepth with open(infile, 'rt') as f: FileNotFoundError: [Errno 2] No such file or directory: 'pyeffg.intersect.bed'

tomgutman commented 3 years ago

Hello, indeed, it is because one of the arguments --filterCoding or --filterNonCoding is required to filter your bed file according to your needs. I just added an error message specifying this.

Thanks for your feedback

deb0612 commented 3 years ago

When I add --filterNonCoding arguments, the other error occurred:

Traceback (most recent call last): File "/NGS_Storage/Debbie/biotools/TMB/bin/pyEffGenomeSize.py", line 146, in feature_filteredGtf = filteredGtf.filter(filterFeatureGtf, featuretypes).saveas("filtered_gtf.gtf") File "/home/user/miniconda3/envs/pyTMB2/lib/python3.6/site-packages/pybedtools/bedtool.py", line 917, in decorated result = method(self, *args, kwargs) File "/home/user/miniconda3/envs/pyTMB2/lib/python3.6/site-packages/pybedtools/bedtool.py", line 3342, in saveas out_compressed=compressed, File "/home/user/miniconda3/envs/pyTMB2/lib/python3.6/site-packages/pybedtools/bedtool.py", line 1412, in _collapse for i in iterable: File "pybedtools/cbedtools.pyx", line 759, in pybedtools.cbedtools.IntervalIterator.next File "/home/user/miniconda3/envs/pyTMB2/lib/python3.6/site-packages/pybedtools/bedtool.py", line 962, in return BedTool((f for f in self if func(f, *args, *kwargs))) File "pybedtools/cbedtools.pyx", line 759, in pybedtools.cbedtools.IntervalIterator.next File "/home/user/miniconda3/envs/pyTMB2/lib/python3.6/site-packages/pybedtools/bedtool.py", line 962, in return BedTool((f for f in self if func(f, args, kwargs))) File "/NGS_Storage/Debbie/biotools/TMB/bin/pyEffGenomeSize.py", line 70, in filterGtf if interval.attrs['transcript_type'] in featuretype: KeyError: 'transcript_type'

tomgutman commented 3 years ago

Hi, I manage to reproduce and fix this error. This comes from the fact that some annotations don't have "transcript_type" info in the 9th column.

sorry for the inconvenience

Solenyalyl commented 2 years ago

I have transcript_id in 9th column, and it reports 'gtf doesn't have transcript_type info ! Can't filter this file'. how can I use this code to calculate effective genome size? image

nservant commented 2 years ago

Yes, you have 'transcript_id", but not "transcript_type" :) The code has been developed with the gencode annotation. So I would suggest to use the gencode gtf file that you can find here ; http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/

jpcartailler commented 1 year ago

Despite two years having passed, there are gencode annotation versions which do not have transcript_type in them (such as release 44), since they have both gene and transcript annotations.

We cannot use the suggested release_19 since it's the wrong assembly (hg19) for our work (hg38).

With that said, release_20 seems to work for hg38 genomes with pyEffGenomeSize.py.

Sharing this since I figured someone might benefit from it.