BiologicalRecordsCentre / UKBMS-online

Issue tracking for UKBMS online recording site
2 stars 0 forks source link

Size Limit on Downloads? #343

Closed IanMiddlebrook closed 6 months ago

IanMiddlebrook commented 7 months ago

Hi @DavidRoy,

We don't seem to be able to download the full data any more. Even for the Samples download which is not the biggest file (just the list of walks) I'm getting this message:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /srv/sites/warehouse1.indicia.org.uk/system/libraries/drivers/Database/Pgsql.php on line 452

I think the memory limit needs to be increased?

This will certainly be a big issue now we're coming to validate the full set for 2023.

Thanks, Ian

DavidRoy commented 7 months ago

@johnvanbreda are you able to pick this one up? Bounce to Andy or Gary if appropriate, and involve Jim/Biren as required

johnvanbreda commented 7 months ago

@DavidRoy I don't have an admin login for UKBMS anymore. Rather than increase memory limits, is switching to Elasticsearch for downloads going to be feasible?

DavidRoy commented 7 months ago

@johnvanbreda I've given you admin rights. I can't remember the reason but these reports were not converted to ElasticSearch. But I'd be happy for the work to be done if won't take too long. There is some urgency for Ian to have this download

johnvanbreda commented 7 months ago

Thanks David. @IanMiddlebrook please can you confirm which page you are trying to download from as I'm a little unfamiliar with the facilities currently on the site.

IanMiddlebrook commented 7 months ago

Hi @johnvanbreda

It's the Annual Summary page (Reporting menu) - load filters to 'combine data for all recorders' and 'combine data for all sites'. Then the two downloads I would need are Samples and Occurrences. Prior to the recent server work, I've been able to download these OK.

Thanks, Ian

DavidRoy commented 7 months ago

@johnvanbreda this page: https://ukbms.org/annual-summary

reports are these I think but they have long urls.
projects/ukbms/ukbms_sample_download.xml projects/ukbms/ukbms_occurrence_download_confidential.xml

I wonder if there is a better way of delivering these downloads for Ian who needs all the data for a given year (or even across years). The report files could be simpler by removing filter parameters? The download files could also be made available via a separate downloads page as we do for the eBMS - https://butterfly-monitoring.net/downloads

johnvanbreda commented 7 months ago

@DavidRoy The issue here is the quantity of data being loaded into server RAM whilst building the download. There are a few options - reviewing the code to page through the data more efficiently or switching to the REST API which already does this, or switching completely to use Elasticsearch. However the quickest fix is going to be an alteration to the server configuration to allow bigger files - I've asked Damian to do this and he plans to look at it later today.

johnvanbreda commented 7 months ago

@IanMiddlebrook the memory limit has been increased and the reports seem OK to me at the moment. Please can you try them again and let me know how you get on?

IanMiddlebrook commented 7 months ago

Hi @johnvanbreda I've managed to download all the WCBS-BBS and WCBS-BC data for 2023. But for Transect data, I'm still getting exactly the same fatal error message (same allowed memory size), for both Samples and Occurrences.

IanMiddlebrook commented 7 months ago

Hi @johnvanbreda

I've had a message from a branch co-ordinator over the weekend - they're getting the same error message trying to get the 'Section Level' download for their branch data.

Thanks, Ian

johnvanbreda commented 7 months ago

@IanMiddlebrook I've sent you an email containing the 2023 data as a temporary solution - let me know if not received.

IanMiddlebrook commented 7 months ago

thanks @johnvanbreda - all received

IanMiddlebrook commented 6 months ago

Hi @johnvanbreda

Thanks for the downloads as a temporary fix for myself, but there are still issues for Co-ordinators of some larger branches being unable to download their data at a critical time of year. Is the memory limit still being investigated?

Thanks, Ian

johnvanbreda commented 6 months ago

Hi @IanMiddlebrook, please can you find out which branch coordinator and exactly what the settings where when they tried a section level download?

IanMiddlebrook commented 6 months ago

Hi @johnvanbreda
It was Bob Annell - the account would be 'Hampshire UKMBS' (user 3599) and he would set up filter by recorder to 'Branch Data' and filter by site to 'Combine data for all my branch sites'.

johnvanbreda commented 6 months ago

Thanks @IanMiddlebrook. I have a new version of the code coming which will reduce the memory usage during a download. I'll let you know when ready.

johnvanbreda commented 6 months ago

@IanMiddlebrook I've deployed the memory fix and successfully downloaded the section level branch data for Hampshire. Let me know if you have any more problems but the change should significantly reduce the memory consumption so hopefully it will work for all the downloads again.

IanMiddlebrook commented 6 months ago

Thanks @johnvanbreda

I'll let the The Hampshire Co-ordinator know - I also managed that download.

I've now been able to download the 'Samples' for all transect data, but the big one - the Occurrences - was not successful. I thought it was, as it produced a CSV file, but that file only contained the error message:

Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 65028096 bytes) in /srv/sites/warehouse1.indicia.org.uk/modules/indicia_svc_data/controllers/data_service_base.php on line 423

johnvanbreda commented 6 months ago

Hi @IanMiddlebrook, I've found a further memory optimisation and I've now managed to download the occurrences file. Please can you let me know if that works now?

IanMiddlebrook commented 6 months ago

That's great thanks @johnvanbreda - I've managed to download all occurrences

For some reason there are three additional/superfluous columns in that download now, without headings - between the Date and the Section No.

Not a problem as I can delete them, but looks a bit odd.

Thanks, Ian

johnvanbreda commented 6 months ago

@IanMiddlebrook I did have to fiddle with the way that dates are added into the report when optimising the memory usage and there was a mistake in the logic which replaced the constituent date fields (date start, end etc) with a single date string. Now fixed so the extra fields should be removed from the report.

Please close if now OK.

IanMiddlebrook commented 6 months ago

That's great - thanks @johnvanbreda