CellProfiler / CellProfiler-Analyst

Open-source software for exploring and analyzing large, high-dimensional image-derived data.
http://cellprofileranalyst.org
Other
144 stars 73 forks source link

CPA does not export the classification results to database when more than 60000 image sets #314

Closed viksyn1 closed 2 years ago

viksyn1 commented 2 years ago

I have encountered an issue when creating the QC classification results of 69120 image sets (groups = negative, saturated, blurry) in MySQL database. The particular table with class and class_number created by CPA did not include values for image sets below 60001. Smaller datasets are working fine. I include the CPA .log file and screenshots CPA_log.txt

screenshot_MySQL_Classification_table
bethac07 commented 2 years ago

Considering the chunksize parameter is 10K, I strongly suspect it seems like the existing results are getting overwritten somehow - which doesn't fix it, but possible help in tracking it down.

bethac07 commented 2 years ago

@viksyn1 Are you running in source or in built? If in source, does it seem like you are seeing these print statements? Just to help us narrow down what code path we seem to be in.

viksyn1 commented 2 years ago

I was running CPA from my Windows PC and not setting the source, so I guess in was in build, I have not seen any of those print statements - the classification table was created successfully. am adding the related image.cs topic https://forum.image.sc/t/cpa-does-not-transfer-all-classification-results-to-mysql-database/59854/3. The main intriguing part is that "Any values that cannot be converted to float were set to 0".

DavidStirling commented 2 years ago

Hi @viksyn1,

Sorry I missed your initial post on this! Thanks for raising it again.

It looks like @bethac07 is right and there's a mistake in the 'chunking' code I added in a previous update. It's designed to use a much faster method of writing results into the database and then fall back to an older method if anything goes wrong. In testing I'd never been able to get the new method to fail so we didn't spot this, but I believe these lines are incorrectly duplicated from above and shouldn't be there. As a result it's wiping the table each time it switches to the new method.

Fixing this should be as simple as removing those lines and adjusting the range calls in the following statements to properly follow the chunk start/end positions. It'd be best to test this with a database that actually exhibits the issue though.

The error isn't obvious in your logs because the file you're providing only appears to capture PostMessage or logging statements, so the print statements are missed. Someone should fix that.

viksyn1 commented 2 years ago

Hi @DavidStirling , great! These lines of code you have highlighted seem to be the cause of the problems. I have CPA installed on Windows and with 'search' in file explorer, I wasn't able to find the 'multiclasssql.py' file. Could you please tell me where the file might me or how can I introduce the change to the code?

DavidStirling commented 2 years ago

You'd need to install CPA from the source code in order to try out this change.

I'll get around to putting together a proper pull request in time, but for now that would allow you to make the edit yourself.