VEuPathDB / EdaSubsettingService

A REST service to provide data and subsetting in the Exploratory Data Analysis Workspace
Apache License 2.0
0 stars 0 forks source link

Subsetting `distribution` histogram service is missing some data due to rounding of variable.rangeMin #29

Closed bobular closed 3 years ago

bobular commented 3 years ago

I know we've had similar discussions w.r.t. to variable.rangeMin (see below) but I think this is a separate one.

Here's how to reproduce the issue:

So the issue here is that the histogram returned by distribution is missing some counts, most likely due to the rounding of the binStart

ryanrdoherty commented 3 years ago

@bobular Is this still an issue? If so, how do you suggest we fix? Looks like if, for non-integer numbers, we (i.e. either Java code or loading team) adjust the min and max to be lower/higher respectively by 0.00001, we will pick up the extra values? If this is the solution, at what stage do you think we should perform the adjustment? Also tagging @steve-fischer-200 since he may have an opinion.

steve-fischer-200 commented 3 years ago

@ryanrdoherty why is leftmost binStart not equal to rangeMin?

bobular commented 3 years ago

As I commented on https://epvb.slack.com/archives/C01894F7P89/p1632869876261800?thread_ts=1632851456.237200&cid=C01894F7P89 - I think this is not a problem with the distribution service. I think there's still an issue with the rounding of variable annotations rangeMin and rangeMax

This ticket may need revisiting https://github.com/VEuPathDB/EdaDataService/issues/76

ryanrdoherty commented 3 years ago

Current plan is to see if the improvements to the subsetting service (integer refinement and float -> double conversions) solve this. Will comment again for QA when these changes are merged to main.

ryanrdoherty commented 3 years ago

Just checked this on QA. Looks like the recent improvements to floating-point number handling have fixed this issue. We now see all records that have values when selecting the whole graph, and the range (lower bound) is correct.

aaronwlsong commented 3 years ago

As @ryanrdoherty said, it looks good now. Closing ticket.

image