While trying to run the latest commit and the latest PIP version, I kept getting missing key errors:
signatureanalyzer -t maf -n 4 --hg_build /aztlan/refs/Homo_sapiens_assembly19.2bit --cosmic cosmic3_ID rebc356_thca48.16MAR2020.maf
Unable to init server: Could not connect: Connection refused
Unable to init server: Could not connect: Connection refused
(signatureanalyzer:24793): Gdk-CRITICAL **: 10:34:16.643: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
/home/eric/.local/lib/python3.6/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.neighbors.base module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.neighbors. Anything that cannot be imported from sklearn.neighbors is now part of the private API.
warnings.warn(message, FutureWarning)
---------------------------------------------------------
---------- S I G N A T U R E A N A L Y Z E R ----------
---------------------------------------------------------
* Using Homo_sapiens_assembly19 build
* Using cosmic3_ID signatures
* Loading spectra from rebc356_thca48.16MAR2020.maf
sys:1: DtypeWarning: Columns (78,79,83,87,88,92,93,94,100,102,103,108,110,114,115,116,118,120,135,136,137,138,139,140,141,142,143,144,149,160,161,162,163,168,177) have mixed types.Specify dtype option on import or set low_memory=False.
* Mapping contexts: 0 / 23933Traceback (most recent call last):
File "/home/eric/.local/bin/signatureanalyzer", line 11, in <module>
load_entry_point('signatureanalyzer', 'console_scripts', 'signatureanalyzer')()
File "/home/eric/getzlab-SignatureAnalyzer/signatureanalyzer/__main__.py", line 181, in main
**vars(args)
File "/home/eric/getzlab-SignatureAnalyzer/signatureanalyzer/signatureanalyzer.py", line 87, in run_maf
cosmic=cosmic
File "/home/eric/getzlab-SignatureAnalyzer/signatureanalyzer/spectra.py", line 194, in get_spectra_from_maf
_context = hg[chromosome][pos - 1 + del_len:pos - 1 + del_len * 6].upper()
KeyError: 'chr1'
(please excuse the X server errors - I don't know why it's complaining about not having X forwarding but it doesn't seem to impact anything.)
It looks like the "chr" prefix gets appended to the MAF but not the reference 2bit chromosome names. I was able to successfully run by removing this line and an identical one later in the file.
I'm happy to make a PR to remove these lines, but I assume they were put there for a purpose. Maybe because they need to be coerced to strings and numpy is trying to interpret them as ints?
Generally I try to avoid enforcing prefixes with references/VCFs/MAFs because it's easy to get mismatches like this. Every center has their own way of resolving them, and in enforcing one way or the other I have usually created more problems than I started with.
I assume this would also lead to the MAF or spectra output having the "chr" prefix, or are they chomped off before final output?
Hi,
While trying to run the latest commit and the latest PIP version, I kept getting missing key errors:
(please excuse the X server errors - I don't know why it's complaining about not having X forwarding but it doesn't seem to impact anything.)
I traced it down to this line: https://github.com/broadinstitute/getzlab-SignatureAnalyzer/blob/bdab04164fa27b6bd9fbf3b344e8b3772b7fe2ef/signatureanalyzer/spectra.py#L98
It looks like the "chr" prefix gets appended to the MAF but not the reference 2bit chromosome names. I was able to successfully run by removing this line and an identical one later in the file.
I'm happy to make a PR to remove these lines, but I assume they were put there for a purpose. Maybe because they need to be coerced to strings and numpy is trying to interpret them as ints?
Generally I try to avoid enforcing prefixes with references/VCFs/MAFs because it's easy to get mismatches like this. Every center has their own way of resolving them, and in enforcing one way or the other I have usually created more problems than I started with.
I assume this would also lead to the MAF or spectra output having the "chr" prefix, or are they chomped off before final output?