Sage-Bionetworks / Genie

Validation and processing of GENIE files
https://genie.synapse.org/
MIT License
12 stars 9 forks source link

fix: update maf read_csv #528

Closed jakevc closed 1 year ago

jakevc commented 1 year ago

I was running into an issue validating my data_mutations_extended.txt file, this was the problem:

→genie validate data_mutations_extended_PROV.txt PROV
Welcome, Jake  VanCampen!

INFO:synapseclient_default:Welcome, Jake  VanCampen!

Downloading  [####################]100.00%   1.7kB/1.7kB (1.4MB/s) SYNAPSE_TABLE_QUERY_125739515.csv Done...
Downloading  [####################]100.00%   1.1kB/1.1kB (1.9MB/s) SYNAPSE_TABLE_QUERY_125734614.csv Done...
Downloading  [--------------------]   0.0bytes (0.0bytes/s) tree     [WARNING] /Users/jake.vancampenprovidence.org/Projects/Genie/genie_registry/maf.py:339: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  mutationdf = pd.read_csv(

WARNING:py.warnings:/Users/jake.vancampenprovidence.org/Projects/Genie/genie_registry/maf.py:339: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  mutationdf = pd.read_csv(

INFO:genie.example_filetype_format:VALIDATING data_mutations_extended_PROV.txt
ERROR:genie.example_filetype_format:maf: Please double check your CHROMOSOME column.  This column must only be these values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT

After making the following change in the validator code the is the output now:

→genie validate data_mutations_extended_PROV.txt PROV
Welcome, Jake  VanCampen!

INFO:synapseclient_default:Welcome, Jake  VanCampen!

INFO:genie.example_filetype_format:VALIDATING data_mutations_extended_PROV.txt
INFO:genie.example_filetype_format:YOUR FILE IS VALIDATED!

Yay!

jakevc commented 1 year ago

Also reflected in my latest errors.txt file: https://www.synapse.org/#!Synapse:syn25994092

thomasyu888 commented 1 year ago

@jakevc We are going to close this, but @rxu17 did cherry pick this and added some tests:https://github.com/Sage-Bionetworks/Genie/pull/530/commits/4ae239bcda9e7f33ceff5b1975f794ffc8115218.

Thanks for your contribution!