BiologicalRecordsCentre / ABLE

Assessing ButterfLies in Europe project repository
2 stars 3 forks source link

Counts with large numbers not uploaded? #702

Open chrisvanswaay opened 1 week ago

chrisvanswaay commented 1 week ago

@kazlauskis Just got a question that counts with >400 Maniola jurtina are not uploaded with error: 'json parse error: unrecognised token ''<' by account 'Defensie, Alle vogelwachten' (VogelwachtDefensie). Other counts before and after were uploaded. Is there a limit to the number of butterflies? Species like M jurtina can be extremely abundant.

kazlauskis commented 5 days ago

@johnvanbreda I can see a number of log entries where a record was successfully uploaded (2XX code), but the response is invalid. The app is quite likely getting back an HTML response, but I'm not sure why. Most appear to be 200 codes, but some are 201.

27 June 14:46:36 (BST) POST https://warehouse1.indicia.org.uk/index.php/services/rest/samples [201] 27 June 07:54:54 (BST) POST https://warehouse1.indicia.org.uk/index.php/services/rest/samples [200]

chrisvanswaay commented 2 days ago

@kazlauskis @johnvanbreda If the system cannot deal with >500 butterflies (which can occur with species as Maniola jurtina), then we should notify the recorder when the limit is close (e.g. by a popup) and advice him to stop the count (you don't have to count the full 15 min) and start a new one.

johnvanbreda commented 2 days ago

@kazlauskis are your timestamps the end of the request or the beginning? Looking at the logs I think they are the moment the app received the response rather than the moment it was sent. In which case I can see that the first example (14:46) was a successful request that created 273 occurrences in 16 seconds, so a little slow, but my logging implies that it was a success. This is followed by the same sample being re-submitted and resulting in a 409 conflict.

I wonder if the app should page the data if submitting more than ~100 occurrences?

kazlauskis commented 6 hours ago

@johnvanbreda These are the end times.

I wonder if the app should page the data if submitting more than ~100 occurrences?

Yes, we could introduce more controlled record uploads in the app, so that the app first uploads the top sample and then its sub-samples in a paginated way. From the user's perspective, this would fix the max-occurrences issue, but the record upload times would be longer. So I wonder if there is something we could do to speed up and process, say, 500 sub-samples in under 10 seconds in the warehouse? Is this the external_key lookup that's the delay? Or is it something that is already optimised and we can't push further?