cellarium-ai / cellarium-cas

Python client libraries for Cellarium Cloud Cell Annotation Service (CAS).
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

More on basic input validation #73

Open mbabadi opened 3 months ago

mbabadi commented 3 months ago

I anticipate that a lot of user error will be due to giving CAS normalized counts data. Unfortunately, we cannot rely on dtype for that: I have seen too many AnnData files with float32 that contain integer counts. Heck, I do that myself all the time :D

A quick a dirty validation is to sample x percent (x ~ 5 to 10) of non-zero counts counts (easy if sparse, a bit more expensive if dense), and ensure that their decimal is < 1e-3. Otherwise, raise an exception with an informative error message. We can also have a flag to disable input data integralness validation (set to False by default) for those who know what they're doing.

mdmanurung commented 1 week ago

I upvote this. The error message is not informative, though I can find out what's wrong by going through the provided vignette.