broadinstitute / oncotator

Other
67 stars 33 forks source link

Cache filename too long? #355

Open lebolo opened 8 years ago

lebolo commented 8 years ago

I'm using oncotator v1.9.0.0 and am trying to speed up the process by creating a file cache for subsequent VCF to MAF processing. I get the error pasted below. I think the problem is that Linux has a filename length limit of 255 characters and the filename for this mutation is 299 characters. Before this mutation, the longest filename I see is 255 characters:

12_57870602_57870602_GGAGGGGGGGCAGGGAGGATCTTGGCCTTCACAAAGAAATGGGAGATTCACATGGGGGTCGTCCAGGAGCTGCGGCAGAGGTCGAGGGGCCTGCAAGGCT_AGCCTTGCAGGCCCCTCGACCTCTGCCGCAGCTCCTGGACGACCCCCATGTGAATCTCCCATTTCTTTGTGAAGGCCAAGATCCTCCCTGCCCCCCCTCC_16348fb8e83ab2e686dfc2a90986f498.

Any thoughts on how I can fix this without modifying oncotator code?

Just in case, this is what my command line looks like:

oncotator
  --input_format=VCF
  --db-dir /path/to/trimmed.cache
  -c /path/to/tx_exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt
  -u file:///path/to/oncotator.cache
  --log_name foo.log
  foo.vcf
  output.maf
  hg19

Error

2016-09-09 10:12:28,501 ERROR [oncotator.output.TcgaMafOutputRenderer:333] Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/oncotator/output/TcgaMafOutputRenderer.py", line 321, in renderMutations
    for m in mutations:
  File "build/bdist.linux-x86_64/egg/oncotator/Annotator.py", line 448, in _applyManualAnnotations
    for m in mutations:
  File "build/bdist.linux-x86_64/egg/oncotator/Annotator.py", line 456, in _applyDefaultAnnotations
    for m in mutations:
  File "build/bdist.linux-x86_64/egg/oncotator/Annotator.py", line 523, in _annotate_mutations_using_datasources
    self._cacheManager.store_annotations_in_cache(m)
  File "build/bdist.linux-x86_64/egg/oncotator/cache/CacheManager.py", line 85, in store_annotations_in_cache
    self._store_basic_annotations_in_cache(cache_key, annotations)
  File "build/bdist.linux-x86_64/egg/oncotator/cache/CacheManager.py", line 91, in _store_basic_annotations_in_cache
    self.get_cache().store_into_cache(key, annotations)
  File "build/bdist.linux-x86_64/egg/oncotator/cache/ShoveCache.py", line 71, in store_into_cache
    self.db[key] = value
  File "/opt/conda/lib/python2.7/site-packages/shove-0.6.6-py2.7.egg/shove/core.py", line 44, in __setitem__
    self.sync()
  File "/opt/conda/lib/python2.7/site-packages/shove-0.6.6-py2.7.egg/shove/core.py", line 74, in sync
    self._store.update(self._buffer)
  File "/opt/conda/lib/python2.7/_abcoll.py", line 566, in update
    self[key] = other[key]
  File "/opt/conda/lib/python2.7/site-packages/shove-0.6.6-py2.7.egg/shove/base.py", line 123, in __setitem__
    raise KeyError(key)
KeyError: '12_62775059_62775301_ATTTATTATAAGTAATTATTTTGAAATTTGGGAAATGAATAGTTGAGTACATCATTTGAGTGTTTATTTGACCAGTTATTAGTGCAAGAAACCATCATTTAAATGTTCTCTGTGTTATAGCAAATCAATTTGTTTCTCTCTTCATATTAATGTGTTTTAAGAAAAATTCATTATGATTACTTTTACTTGAAAATATATTTTATTTTTTGCAGTGTTTGAGCAACACACCTCCACTTACTGAGT_-_16348fb8e83ab2e686dfc2a90986f498'

2016-09-09 10:12:28,501 ERROR [oncotator.output.TcgaMafOutputRenderer:334] Error at mutation 1746159 ['12', 62775059, 62775301, 'ATTTATTATAAGTAATTATTTTGAAATTTGGGAAATGAA  TAGTTGAGTACATCATTTGAGTGTTTATTTGACCAGTTATTAGTGCAAGAAACCATCATTTAAATGTTCTCTGTGTTATAGCAAATCAATTTGTTTCTCTCTTCATATTAATGTGTTTTAAGAAAAATTCATTATGATTACTTTTACTTGAAAATATATTTTATTTTT  TGCAGTGTTTGAGCAACACACCTCCACTTACTGAGT', '-']:
2016-09-09 10:12:28,501 ERROR [oncotator.output.TcgaMafOutputRenderer:335] Incomplete: rendered 1746159 mutations.
Traceback (most recent call last):
  File "/opt/conda/bin/oncotator", line 11, in <module>
    load_entry_point('Oncotator==1.9.0.0', 'console_scripts', 'oncotator')()
  File "build/bdist.linux-x86_64/egg/oncotator/Oncotator.py", line 309, in main
  File "build/bdist.linux-x86_64/egg/oncotator/Annotator.py", line 437, in annotate
  File "build/bdist.linux-x86_64/egg/oncotator/output/TcgaMafOutputRenderer.py", line 337, in renderMutations
KeyError: '12_62775059_62775301_ATTTATTATAAGTAATTATTTTGAAATTTGGGAAATGAATAGTTGAGTACATCATTTGAGTGTTTATTTGACCAGTTATTAGTGCAAGAAACCATCATTTAAATGTTCTCTGTGTTATAGCAAATCAATTTGTTTC  TCTCTTCATATTAATGTGTTTTAAGAAAAATTCATTATGATTACTTTTACTTGAAAATATATTTTATTTTTTGCAGTGTTTGAGCAACACACCTCCACTTACTGAGT_-_16348fb8e83ab2e686dfc2a90986f498'
LeeTL1220 commented 8 years ago

What is trimmed.cache? Is that a proper datasource dir?

Also, are you sure that your VCF is valid at this mutation? It looks like it has spaces in it...

lebolo commented 8 years ago

trimmed.cache is a proper datasource directory. If I remove the caching option, Oncotator successfully runs through the whole VCF using this configuration. The spaces are an artifact of my copy-pasting, it's a valid VCF throughout.