Closed AndrewZoldy closed 3 years ago
Hi Andrey, thanks for brought up this issue to us. You are right, currently, meta has been loaded line by line, it was not a problem before, but for such data which has 101266 rows
, it could be a problem. I am sure we can make some improvements on data importing. I will take a look in early October.
Just want to make sure that only data loading is slow, right? Not the website data loading.
Gaofei
Hello, Gaofei, Yes, the issue is only in the importing, website works good. Thank you for your answer!
Good to know that. I will look into this later and update to ticket.
Hi @AndrewZoldy, one performance improvement has been made, you can use the release version after 3.7.15 to have a faster-importing speed.
Hello, in my project the team has two files for cbioportal application which we are loading as "Generic Assay" data. Both files contains expression data measured with mass spectrometry on site level, each for one site per file. One of them contains 18767 rows for 109 samples (plus column with gene and column with site name) and second one has 101266 rows for 109 samples (plus gene an site columns). Processing for the first one took around 5 hours, and for the second - around 25 hours. Meanwhile the proteomics data, which consists of pretty similar data structure processed in 46 seconds (10275 rows, 109 samples).
We got a little investigation into the cbioportal code and it looks like for generic assay data it goes into database for each row separately. (https://github.com/cBioPortal/cbioportal/blob/master/core/src/main/java/org/mskcc/cbio/portal/scripts/ImportGenericAssayEntity.java#L188) If I'm wrong, then could you please explain what could be the reason? May we get any changes in our meta files to fix this maybe?
The meta files looks as follows (same structure for both):
Best, Andrey