VEuPathDB / EdaSubsettingService

A REST service to provide data and subsetting in the Exploratory Data Analysis Workspace
Apache License 2.0
0 stars 0 forks source link

Variable distribution request with too small bin width blows out system memory #53

Closed chowington closed 2 years ago

chowington commented 2 years ago

Example URL & payload:

https://localhost:3000/eda-subsetting-service/studies/GEMSCC1A03-1/entities/EUPATH_0000738/variables/OBI_0001169/distribution

{"valueSpec":"count","filters":[],"binSpec":{"displayRangeMin":0,"displayRangeMax":62,"binWidth":0.062}}

This is the bin width that creates 1,000 bins for this range.

(I believe this is the correct request, but I didn't test it because it would blow up the system again :)

moontrip commented 2 years ago

@chowington I think that your example is Age variable at GEMS1. Basically Histogram Viz is quite similar to Histogram filter (Subsetting) and the Viz has the range from 0.1 to 31 (based on bin width slider), while the filter has the range of 0.062 to 62. I didn't carefully check the codes, but perhaps Viz already had a limit in there?

chowington commented 2 years ago

Close, DK! It's Age at GEMS1A, at least that's the example I've been using from the original frontend ticket: https://github.com/VEuPathDB/web-eda/issues/654

moontrip commented 2 years ago

@chowington Close! 😄 I looked at a bit more just out of curiosity, and found that Viz backend response returns binSlider info via response.histogram.config.binSlider while distribution does not have that luxury. So just pure my guess is that if you ask Danielle about the logic to compute binSlider (probably done by R) then the same/similar logic may be used in the client for Distribution? It would be helpful for matching the bin slider controls for both Histogram Filter and Viz.

chowington commented 2 years ago

Right, Bob mentioned in the ticket that the subetting histogram didn't receive bin width ranges from the backend, unlike the histogram viz. He wrote in the ticket: "IIRC the distribution endpoint that drives subsetting is stripped down and does not calculate things like this for a reason." Whatever that reason might be is above my paygrade :)

Edit: But I see, you're saying we can duplicate that logic on the frontend ourselves. Possibly! I can ask Danielle about it on the other ticket 👍

ryanrdoherty commented 2 years ago

This was caused by an infinite loop in the FgpUtil distribution library, triggered when a client submits a distribution bin size between 0 and 1 on an integer variable. The type coercion converted this to a bin size of zero, meaning additional bins never changed range. It was fixed in the underlying library, which now throws a 400 in the offending case and a small improvement to the error message was added to this project.