IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
879 stars 492 forks source link

Standardize the image_url field of the Search API so that it uses regular URLs instead of base64 for all result types #10831

Closed GPortas closed 1 month ago

GPortas commented 1 month ago

Overview of the Feature Request

Currently, image_url returns base64 URLs for files and dataverses, while it returns a regular URL for datasets. The goal is to standardize all URLs to use the same format. We have chosen to use regular URLs instead of base64 for all cases.

What kind of user is the feature intended for? API User

What inspired the request?

https://github.com/IQSS/dataverse/pull/10811#discussion_r1750411773

What existing behavior do you want changed?

Use regular URLs in image_url fields for all result types of the Search API.

Any brand new behavior do you want to add to Dataverse?

None

Any open or closed issues related to this feature request?

https://github.com/IQSS/dataverse/pull/10811

Are you thinking about creating a pull request for this feature?

Yes

stevenwinship commented 1 month ago

@GPortas @pdurbin Dataverses and Datasets have thumbnail images designated by the /logo at the end of the url. Files do not have these. If the image_url is changed from base64 to the url of the file then the response will have both "url" and "image_url" with the same values. Are you asking for a new API for files to get a "logo" ( I.e /api/files/{id}/logo) or is the current /api/access/datafile endpoint all you want to see? Changing the image_url is a break to backward compatibility. And since the access url is already there I'm wondering why this is needed

Here is an example of what the change would look like: { "name": "bird.jpg", "type": "file", "url": "http://localhost:8080/api/access/datafile/3", "image_url": "http://localhost:8080/api/access/datafile/3", "file_id": "3",

{ "name": "test1", "type": "dataset", "url": "https://doi.org/10.5072/FK2/ARNIUJ", "image_url": "http://localhost:8080/api/datasets/2/logo", "global_id": "doi:10.5072/FK2/ARNIUJ",

qqmyers commented 1 month ago

Not sure I'm following the whole discussion but FWIW: files have thumbnail URLs like https://demo.dataverse.org/api/access/datafile/2378029?imageThumb=true which is what is used within the dataset page file table.

stevenwinship commented 1 month ago

So, are you saying the url and image_url should look like this:

"url": "http://localhost:8080/api/access/datafile/3", "image_url": "http://localhost:8080/api/access/datafile/3?imageThumb=true",

qqmyers commented 1 month ago

I think that would work. Whether it makes sense to providing a new /logo URL with no parameters, which might let browsers cache the result is perhaps a separate question. (FWIW: If S3 direct storage is used, any URL is going to be a redirect to the S3 object, possibly a signed URL if the file is draft - I don't know how caching works in a case like that.) Perhaps just using the existing URL is enough and we can see if getting these small images is really a performance issue these days. (We're somewhat guessing that replacing the base64 images doesn't slow the existing UI noticeably (if at all) so just doing a small tweak to be able to test that might be a good start.)

GPortas commented 1 month ago

@stevenwinship

This is a backward-incompatible change, but only in comparison to the updates introduced in PR #10811, as image URLs were not being returned via the API before that PR, as mentioned in the PR description:

_"The imageurl field was already included in the SolrSearchResult JSON, but it wasn’t returned by the API because it was appended only after the Solr query was processed in the SearchIncludeFragment of JSF. Now, the field is set in SearchServiceBean, ensuring it is always returned by the API when an image is available."

So, if no version of Dataverse has been released that returns image_url in the Search API results, there shouldn't be anything to break.

A separate topic would be deciding whether to remove, after switching to regular URLs, the image_url values set in SearchIncludeFragment so that JSF also stops using base64, since regardless of what SearchServiceBean sets, base64 images are set in JSF after the Solr search: https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/search/SearchIncludeFragment.java#L1425