Closed GraemeWatt closed 10 months ago
Rather than only loading the first 50 rows of a data table, a simpler approach would be to check the file size before loading a data table. If it is greater than some (configurable) value like 1 MB, the independent_variables
and dependent_variables
could be set to empty lists []
and a message displayed like: "This is a large data table (x.y MB). Do you want to display it?". If the user clicked "Yes" the data table would be loaded in the usual way. YAML data files are now restricted to be less than 10 MB as part of the validation, but this was not always the case. Some examples of large data tables:
https://www.hepdata.net/record/ins1798511?version=1&table=Table%2022-23%20statistical%20correlations https://www.hepdata.net/record/ins1630886?version=3&table=Table%204 https://www.hepdata.net/record/ins1740909?version=2 (Tables 185 to 188)
A related problem is with large text files attached as additional resources either to a record or an individual table which are rendered in a browser via resource_details.html
. Here, there is no restriction on size made as part of the validation. A check should be made on the file size, and if it is greater than some (configurable) value like 1 MB, the file should only be made available for download but not rendered in a browser. Examples:
https://www.hepdata.net/record/ins2013051?version=1 (large YAML files attached as additional publication resources) https://www.hepdata.net/record/ins2077557?version=1 (HistFactory JSON files attached as additional publication resources)
@ItIsJordan : The first table (1.6 MB) of https://www.hepdata.net/record/ins1869138?version=2 was reported by a user as causing problems due to its large size, so it might provide a good example for testing while not being excessively large.
We agreed last time we met in my office to change the limit SIZE_LOAD_CHECK_THRESHOLD = 1000000
to the binary equivalent of one megabyte SIZE_LOAD_CHECK_THRESHOLD = 1048576
. For testing purposes, you could temporarily reduce this value to enable testing with a reasonably small data table.
Instead of "This table is too large to load automatically.", I prefer my suggested text from the comment above: "This is a large data table (x.y MB). Do you want to display it?" where x.y MB is replaced by the table size.
I forgot that there's a second part of this issue to suppress display of large additional resource text files. See the second paragraph of the comment above beginning "A related problem....".
For very large tables possibly containing thousands of rows, currently only the first 50 rows are displayed initially and the user needs to click to see all rows. But all rows are initially loaded by the browser and plotted, which can result in a long delay (of a minute or more) when initially loading a record and when switching between tables. Instead, only the first 50 rows should be loaded and plotted initially, then the user would need to click to load all rows and display them in the table and on the generated plot.
In more detail,
get_table_details
would first callgenerate_table_structure
to return only the first 50 rows, by in turn callingprocess_independent_variables
andprocess_dependent_variables
for only the first 50 rows. Then only the first 50 rows would be rendered in the table and plot. If the user clicked "Show All values", the web page would be reloaded andget_table_details
would load all rows of the table.Alternatively, don't load, display and visualise a table by default, only when clicking a button. Or simpler, just put some maximum cut-off on the number of rows of a table for it to be rendered in a browser.