Open inodb opened 1 year ago
I have used 20G heap size (probably I was going to use a lot more) for a ~1.5G file (pog570_bcgsc_2020/data_mutations.txt) just to load... We shouldn't touch a single character of a line until we really need it because it actually takes 3-5 secs to load that file using a buffered reader
its runtime-wise problem is solved by #227 its memory-wise problem could be solved by not using a giant Map<String, VariantAnnotation> gnResponseVariantKeyMap
@inodb @rmadupuri @sheridancbio
For some reason we use giant Map<String, VariantAnnotation> gnResponseVariantKeyMap do we really need this? I have my doubts...
let me show you what's going on step by step
these steps suggest that there should be fewer OriginalVariantQuery than genomicLocations and for some reason, we should use the last inserted OriginalVariantQuery
It sounds to me that this is unnecessary. These steps should be converted into this:
and now, garbage collector can start to clean unused POST response data also we can introduce multi threading (without solving memory issue this will be "meh")
If these steps can't be changed, using a smaller version VariantAnnotation 'might' help
We can use this file to test:
https://github.com/cBioPortal/datahub/blob/master/public/difg_glass_2019/data_mutations.txt