WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
237 stars 187 forks source link

Some Jamendo tracks do not allow downloads #3499

Open stacimc opened 9 months ago

stacimc commented 9 months ago

Description

Some Jamendo records with CC licenses nevertheless do not allow audio downloads from the Jamendo site.

Thanks @fcoveram for reporting

Reproduction

  1. Check out this audio result in Openverse: https://openverse.org/audio/74d3a2a9-88ce-4232-b165-2ac8612dd0c5?q=jazz
  2. Click the Get this audio button to be taken to the record's page on Jamendo
  3. On the Jamendo page, click the Free Download button
  4. Observe that the download button is disabled with the message "Download not available: This artist has disabled the download of this track on Jamendo Music."

Screenshots

Audio result on Openverse

Screenshot 2023-12-08 at 9 34 58 AM

Audio result on Jamendo

Screenshot 2023-12-08 at 9 35 42 AM

Jamendo download page with downloads disabled

Screenshot 2023-12-08 at 9 36 17 AM

Additional context

Tracks returned by the Jamendo API include an audiodownload_allowed field, separate from the license, which is not currrently used by Openverse. For some records with CC licenses, this field is nonetheless False.

Here is an example API response from Jamendo for a record that has a CC license, but `audiodownload_allowed` is False ``` { "id":"1687", "name":"Rioting", "duration":235, "artist_id":"272", "artist_name":"Chroma", "artist_idstr":"chroma", "album_name":"first", "album_id":"248", "license_ccurl":"http://creativecommons.org/licenses/by-nc-sa/2.0/fr/", "position":1, "releasedate":"2005-07-08", "album_image":"[https://usercontent.jamendo.com?type=album&id=248&width=300&trackid=1687](https://usercontent.jamendo.com/?type=album&id=248&width=300&trackid=1687)", "audio":"https://prod-1.storage.jamendo.com/?trackid=1687&format=mp32&from=YG3K4BmvxKBdtt5swAthcw%3D%3D%7CnQCkm%2FibREubzaVo%2FkDYPw%3D%3D", "audiodownload":"https://prod-1.storage.jamendo.com/download/track/1687/mp32/", "prourl":"", "shorturl":"https://jamen.do/t/1687", "shareurl":"https://www.jamendo.com/track/1687", "lyrics":"", "audiodownload_allowed":false, "image":"[https://usercontent.jamendo.com?type=album&id=248&width=300&trackid=1687](https://usercontent.jamendo.com/?type=album&id=248&width=300&trackid=1687)", "musicinfo":{ "vocalinstrumental":"instrumental", "gender":"neutral", "acousticelectric":"electric", "speed":"medium", "tags":{ "genres":[ "electronic" ], "instruments":[ ], "vartags":[ "neutral" ] } }, "licenses":{ "ccnc":"true", "ccnd":"false", "ccsa":"true", "prolicensing":"false", "probackground":"false", "cc":"true" }, "stats":{ "rate_downloads_total":462, "rate_listened_total":3003, "playlisted":13, "favorited":9, "likes":0, "dislikes":0, "avgnote":0, "notes":0 } } ```

We could update the Jamendo DAG to exclude records with audiodownload_allowed disabled. In order to backfill this change to already ingested records, a multi-step plan like the following could be used:

  1. First, merge a change to the Jamendo DAG to write the audiodownload_allowed field in the meta_data column for all records.
  2. Run the Jamendo DAG, which is a non-dated DAG. This should write that field for all records. At this point, we'll also be able to see how many records meet this criteria.
  3. Use the delete_records DAG to select all Jamendo records with audiodownload_allowed set to False. This DAG deletes those records from the audio table, but stores a copy in the deleted_audio table so they can be restored if we want.
  4. Update the Jamendo DAG to discard records with audiodownload_allowed set to False, so no new records of this type are ingested.
stacimc commented 9 months ago

@WordPress/openverse-maintainers What do you think? Should the records be removed, or is there a better solution, perhaps by displaying additional context somewhere on our own result page?

These records appear to have legitimate CC licenses, but they cannot be downloaded. The user experience is particularly confusing because the button on Openverse's single audio result page reads "Get this audio".

krysal commented 9 months ago

Proven the records are still under CC-licensed, my opinion is that we shouldn't remove them. Openverse does not guarantee the original file is fully available to users (see the Freesound case), and the main goal is discoverability. It's important to make all the CC works available out there on the web known to the public, plus this behavior is probably common for works with an NC or ND license, where I assume each author would prefer users to request permissions individually.

I reiterate that this is my position if it is found that they still have a CC license, otherwise they can be deleted.

stacimc commented 9 months ago

It would be helpful to have a better understanding of how Jamendo is treating these. In the Freesound case the track can be downloaded, you just have to log into Freesound.

Here, the free download button is disabled entirely. I tried signing up for Jamendo and was still unable to download the track. If you click the other button on the download page (Get a license on Jamendo licensing), you get this page:

Screenshot 2023-12-08 at 12 04 26 PM

A couple things:

There are zero mentions of the CC license on the download page. However, the icons are still visible on the detail page, right above another link to the Jamendo licensing page:

Screenshot 2023-12-08 at 12 12 13 PM

It's really unclear to me what's going on here. The API and the detail page both clearly indicate a CC license, but the messaging across all pages also indicates that paid Jamendo license is necessary and the one available download is for "testing purposes only". But to @krysal's point, Openverse only guarantees that its records are CC-licensed, so I agree there's a strong case for leaving them in. I do think this might result in a frustrating experience for a user, however. It would be nice to have some way of filtering these out.

For contrast, here's the download page for a Jamendo record with audiodownload_allowed enabled -- note the CC license is reiterated here:

Screenshot 2023-12-08 at 12 14 12 PM
fcoveram commented 9 months ago

Interesting case.

I vote for removing these kinds of media items. The intention of assigning a CC license is to allow the use of work under certain conditions. However, it is true that Openverse stands for surfacing work under CC licenses to make them findable, but if a service sets a barrier to start using it, then there is a conceptual conflict where Openverse breaks down the expectation of using the media work.

So far, the main advice Openverse holds is asking users to check the license on the source site, but not to check if they can "use something" after clicking on "get this media."

dhruvkb commented 9 months ago

I personally lean towards keeping these tracks in our datastore (so that we are not gatekeeping licensed works from being indexed or not) but not surfacing them high in search results (so that our search results remain accessible and reusable). We could

To be honest, I wouldn't even mind them being removed from the catalog as they are CC-licensed legally but not abiding by the free-sharing spirit of the licenses, but that puts us in the position of choosing what CC-licensed works we index and which we don't. That should not be something Openverse does.

sarayourfriend commented 9 months ago

My line of thinking follows Dhruv and Francisco's. Openverse is about discoverability, like Krystle said, but discoverability in order to use and remix, not for the sake of discoverability alone.

I like Dhruv's idea to keep them in the catalogue and to either exclude them from search or penalise them. There's a part of me that likes the idea of Openverse's dataset as a "comprehensive-as-possible" account of all known CC licensed works. But our search API is to help people find things they can use and not for researching the landscape of CC licensed works.

The question of whether Jamendo is doing something strange here, if it's in the spirit of the licenses, and so forth, is a good one to forward to Creative Commons. If Jamendo is a big name that is misrepresenting their usage of CC to get the benefit of the perception of open access and open sharing but not actually following that ethos, it's more of a CC question than an Openverse one. Our priority should be to ensure our search fulfills the specific goals within its remit. We can individually participate in conversations about how organisations or individuals interpret or employ CC and whether we agree with those individual approaches, but it feels (to me) outside of Openverse's remit.

To summarise: my vote is to exclude these results from search, if not forever, at least temporarily. If we (as the Openverse project) want to get in touch with Jamendo to understand what they are conveying through the CC licence on those works, I would support that, becuase it would give us a better understanding of the overall landscape of interpretations of CC that could help us better understand provider's decisions in the future. If individuals want to comment on Jamendo's approach and ask for them to change it, I think that and commentary about it is better suited through the Creative Commons organisation rather than Openverse. Whether to exclude the data from the catalogue is something I have no strong opinion about and I'd recommend whatever approach makes it easiest to exclude the results from search, rather than creating a new special category of works in the catalogue just for this edge case. I also presume we could easily recover these works through backfilling if we wanted to reintroduce them to the catalogue (for example, if discussions with Jamendo about their intentions with this give us a better understanding of how to present them in search).

stacimc commented 9 months ago

I also presume we could easily recover these works through backfilling if we wanted to reintroduce them to the catalogue (for example, if discussions with Jamendo about their intentions with this give us a better understanding of how to present them in search).

Definitely! Since Jamendo is non-dated, if we want to reintroduce them to the catalog we'd just need to make a simple code change to stop filtering them at ingestion, and then run the DAG once (and do a data refresh, of course).

I really like @dhruvkb's idea to push these results down in search as an alternative, but it would involve significantly more work (including frontend work to add some sort of disclaimer on those individual result pages).

krysal commented 9 months ago

I want to emphasize what we mean by usage here, as far as I can tell, the track in the issue can be played in full on Jamendo (and from Openverse as well). The users may not be able to download it but they can play it in Jamendo as many times as they want. The license also doesn't allow for remixes or adaptations. I don't get why we should make a downloadable file a prerequisite to include works in Openverse, as I said before, I think this will be a common experience for works with this licenses (NC, ND).

That's why I strongly prefer we add a download_allowed field and make it a filter instead of deleting them.

fcoveram commented 9 months ago

If Jamendo is a big name that is misrepresenting their usage of CC to get the benefit of the perception of open access and open sharing but not actually following that ethos, it's more of a CC question than an Openverse one.

The license also doesn't allow for remixes or adaptations. I don't get why we should make a downloadable file a prerequisite to include works in Openverse, as I said before, I think this will be a common experience for works with this licenses (NC, ND).

CC BY-NC-ND say that (quote) "You are free to: Share — copy and redistribute the material in any medium or format." Therefore you should be access to the file.

I really like @dhruvkb's idea to push these results down in search as an alternative, but it would involve significantly more work (including frontend work to add some sort of disclaimer on those individual result pages).

This could also be a good idea, and so far, I envision a flow similar to surfacing sensitive content. The results have this filtered disabled by default, and once you click on something like "Show results that can not be downloaded" (the terrible copy is TBD indeed), you see them in the results. Then, the medial details page's CTA changes its label to something like "See the [media]"

krysal commented 9 months ago

Right! @fcoveram made me rethink, and now I support the idea of penalizing these works, but I'm not 100% convinced that a downloadable file should be a prerequisite to be in Openverse (for NC & ND licenses). "You" as a user are free to copy and redistribute, but is the creator obligated to share the file? Does being online free to play count? That is more of a question for CC folks, as mentioned before. There is also the possibility that the author may prefer people to contact them directly to obtain the file.

AetherUnbound commented 9 months ago

Thanks everyone for sharing your perspectives here! I agree that removing these records until we're able to handle them more appropriately (either with penalization, filters, or altered CTA on the details page) is the best way to go to prevent reputational harm to Openverse.

It seems that our immediate next step is to alter the Jamendo DAG to start capturing this information, does that seem right @stacimc? Would it be much work to add that to the metadata field?