cBioPortal / cbioportal-core

Externalized cBioPortal Core
2 stars 15 forks source link

Speed up CnaEvent lookup during import #25

Open sheridancbio opened 8 months ago

sheridancbio commented 8 months ago
sheridancbio commented 8 months ago

This is an in-progress attempt to implement an efficient lookup of existing CnaEvent.Event references as intended by #1 That PR had a bug, in that it obtained the event_id value from the newly created CnaEvent as parsed from the data_cna.txt file. However, the constructed CnaEvent objects coming from that parsing will not have any event_id set (they will default to 0). Instead, the event_id must be retrieved from an existing CnaEvent.Event object retrieved from the database through the DaoCnaEvent.getAllCnaEvents() call.

Instead of storing the CnaEvent.Events in a java Set, this PR stores them in a Map, mapping each event to itself. This allows retrieval from the Hashmap using .get(), which is efficient (more efficient than the linear search). In order for that to work correctly, the equals() function and hashCode() function of CnaEvent.Event has been overridden (previously) to function based only on gene and alteration fields (not event_id). The previous Map was demoted to a Set in this PR: https://github.com/cBioPortal/cbioportal/pull/9847 This PR reverses that and restores the previous HashMap lookup.

sheridancbio commented 8 months ago

Note : the validity of using the HashMap representation to retrieve the event_id populated objects based on a non-event_id argument to the Map.get(key) function was tested. However, a built importer with these code changes failed to obtain the event_id values as expected. Some additional debugging is needed ... probably related to the equals() and hashCode() functions of the other types contained inside of the CnaEvent.Event type.