cBioPortal / cbioportal

cBioPortal for Cancer Genomics
https://cbioportal.org
GNU Affero General Public License v3.0
635 stars 487 forks source link

Errors with importing CNAs for v5.2.3 #9957

Closed AHTARazzak closed 1 year ago

AHTARazzak commented 1 year ago

Hello,

I've recently updated our local cbioportal to 5.2.3. instance to take advantage of @BasLee new CNA input format.

However, when I try to load a new study I am met with the following error:

at org.mskcc.cbio.portal.scripts.ImportProfileData.main(ImportProfileData.java:189)
12:22:43.390 [main] INFO  o.m.cbio.portal.util.ServerDetector - Detected server null
Reading data from:  /study/data_microbiome.txt
Recaching...
Finished recaching...
--> profile id:  92
--> profile name:  Microbiome Signatures (log RNA Seq CPM)
--> genetic alteration type:  GENERIC_ASSAY
Reading data from: /study/data_microbiome.txt
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 1
at org.mskcc.cbio.portal.scripts.ImportGenericAssayEntity.importData(ImportGenericAssayEntity.java:193)
at org.mskcc.cbio.portal.scripts.ImportProfileData.run(ImportProfileData.java:120)
at org.mskcc.cbio.portal.scripts.ConsoleRunnable.runInConsole(ConsoleRunnable.java:145)
at org.mskcc.cbio.portal.scripts.ImportProfileData.main(ImportProfileData.java:189)

ABORTED!
java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 1
at org.mskcc.cbio.portal.scripts.ImportProfileData.run(ImportProfileData.java:169)
at org.mskcc.cbio.portal.scripts.ConsoleRunnable.runInConsole(ConsoleRunnable.java:145)
at org.mskcc.cbio.portal.scripts.ImportProfileData.main(ImportProfileData.java:189)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 1
at org.mskcc.cbio.portal.scripts.ImportGenericAssayEntity.importData(ImportGenericAssayEntity.java:193)
at org.mskcc.cbio.portal.scripts.ImportProfileData.run(ImportProfileData.java:120)
... 2 more
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error occurred during data loading step. Please fix the problem and run this again to make sure study is completely loaded.
Traceback (most recent call last):
  File "/usr/local/bin/metaImport.py", line 202, in <module>
    cbioportalImporter.main(args)
  File "/cbioportal/core/src/main/scripts/importer/cbioportalImporter.py", line 543, in main
    process_directory(jvm_args, study_directory, args.update_generic_assay_entity)
  File "/cbioportal/core/src/main/scripts/importer/cbioportalImporter.py", line 372, in process_directory
    import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
  File "/cbioportal/core/src/main/scripts/importer/cbioportalImporter.py", line 162, in import_study_data
    run_java(*args)
  File "/cbioportal/core/src/main/scripts/importer/cbioportal_common.py", line 1001, in run_java
    raise RuntimeError('Aborting due to error while executing step.')
RuntimeError: Aborting due to error while executing step.

I suspect it a result of: o.m.cbio.portal.util.ServerDetector - Detected server null

If i revert to an earlier version (5.1.0) I do not meet this error & can load my data fine (when it doesn't possess the new CNA format)

I'll tinker about but any guidance or advice is greatly appreciated.

BasLee commented 1 year ago

Maybe something changed with the stable IDs in data_microbiome.txt? https://github.com/cBioPortal/cbioportal/blob/master/core/src/main/java/org/mskcc/cbio/portal/scripts/ImportGenericAssayEntity.java#L193

AHTARazzak commented 1 year ago

Maybe something changed with the stable IDs in data_microbiome.txt? https://github.com/cBioPortal/cbioportal/blob/master/core/src/main/java/org/mskcc/cbio/portal/scripts/ImportGenericAssayEntity.java#L193

Hmm interesting, I don't even have a data_microbiome.txt file in my study so that could be the root cause. Good catch, will investigate tomorrow.

AHTARazzak commented 1 year ago

@BasLee Going through trouble shooting process reallised I made a critical mistake. That I tried loading in a public study: acc_tcga_pan_can_atlas_2018

Which led to the above errors regarding data_microbiome.txt- perhaps as you've outlined maybe some has changed there.

But more related my problem- I identified that it was problems in expression file formatting, there were 2 problem files:

Although they import fine in v5.1.0 maybe something has changed since then?

Additionally, when I attempt to import the new CNA data (files attached) I am met with the following errors:

DEBUG: meta_cna_discrete.txt: Starting validation of meta file
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error occurred during validation step:
INFO: meta_cna_discrete.txt: Validation of meta file complete
Traceback (most recent call last):
  File "/usr/local/bin/metaImport.py", line 121, in <module>
    exitcode = validateData.main_validate(args)
  File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 5647, in main_validate
    validate_study(study_dir, portal_instance, logger, relaxed_mode, strict_maf_checks)
  File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 5370, in validate_study
    tags_file_path) = process_metadata_files(study_dir, portal_instance, logger, relaxed_mode, strict_maf_checks)
  File "/cbioportal/core/src/main/scripts/importer/validateData.py", line 4796, in process_metadata_files
    validator_class = globals()[VALIDATOR_IDS[meta_file_type]]
KeyError: 'meta_CNA_long'

data_cna_discrete.txt meta_cna_discrete.txt

AHTARazzak commented 1 year ago

As an update, Bas identified that there is a bug where the data_cna_discrete.txt file demands an entrez_id even if the hugo_symbol is filled when loading a study. Need to confirm but I think the intended fix is to allow the entrez_id value to be absent when a hugo_symbol is defined for an entry.

BasLee commented 1 year ago

The import of cna long format entries with hugo instead of entrez IDs should be fixed in release 5.2.4: https://github.com/cBioPortal/cbioportal/discussions/9964

AHTARazzak commented 1 year ago

Awesome, thanks Bas. Can verify the study with SVS data I wanted to load is successfully importing & visualising with just a hugo symbol, sample id & value.

As a side note, thought the Expression data import was also somehow affected but totally wrong- artifact from my transformation script due to CNA transformation changes.

Thanks a bunch !