geoff-maddock / events-tracker

CRM and calendar to track events, weekly and monthly series, promoters, artists, producers, djs, venues and other entities.
https://arcane.city
MIT License
14 stars 5 forks source link

Bandcamp - Spider URL function #921

Open geoff-maddock opened 1 year ago

geoff-maddock commented 1 year ago

Story

In order to build audio playlists for entities, I would like to be able to provide a single bandcamp link and return a list of all albums or tracks on that page. Ideally based on search criteria as well such as title, create date and popularity.

Possible Solutions

Use a CURL call to query the URL from the server:

This is not standardized via a public API, so it's a bit sketchy to rely on.

Look into if there's a 3rd party API that does this already? Buymusic.club has an API that returns data from Album/Track links.

Lastly, skip spidering, and have users add specific media links to entities and use that data only to build playlists. Even from an album or track link, we'll need a way to collect the tracks and the audio links (again, buymusic.club or metadata)

Useful general scraping: https://www.scrapingbee.com/blog/web-scraping-php/

JPlayer is a bit old, but still looks like a pretty good option: https://jplayer.org/latest/demo-02/

geoff-maddock commented 1 year ago

Examine the ripple code: it looks to be pregmatching meta data to get data from bandcamp. Might be able to extend that to get release data from the top level URLs https://github.com/jamband/ripple/blob/main/src/Providers/Bandcamp.php

Spike: Start with a bandcamp URL for a page that is full of tracks Add a function that will query that page, get all the tracks on it and return an array of data including the player URLs - see if I can't do it directly, or if not, leverage the buymusic API. Leverage that data to build an audio player template that can be displayed.

geoff-maddock commented 1 year ago

This code has the functionality to spider an artist page: https://github.com/Otiel/BandcampDownloader

Looks like it's just pregmatching links in the page: https://github.com/Otiel/BandcampDownloader/blob/master/src/BandcampDownloader/Helpers/BandcampHelper.cs

geoff-maddock commented 1 year ago

Artist page to work with: https://0h85.bandcamp.com/

Release page to work with: https://streetheat412.bandcamp.com

<meta property="og:video" content="https://bandcamp.com/EmbeddedPlayer/v=2/album=3009540073/size=large/tracklist=false/artwork=small/">

So I could probably skip the ripple step if I just add my own function for getting meta data from related URLs.

This specific page contains only album links, for example: https://streetheat412.bandcamp.com/album/streetheat-001

There are links in the application/ld+json header. Processing the header: https://stackoverflow.com/questions/71807406/php-get-application-ldjson-data-from-external-recipe-page

Using a standard AlbumRelease format from Schema.org at the album level

geoff-maddock commented 1 year ago

Make spidering process create a cache of resources Run as an offline task that processes resource links, and populates some cache tables in the database

// links that can be spidered to add resources - may not have actual audio or releases associated with them // use processes to extract them from link objects as well as from event body text MediaLink id url media_service_type_id imported_from_object_type imported_from_object_id created_at

MediaServiceType id name // bandcamp, soundcloud, youtube, etc

MediaType id name // container, album, track, playlist

MediaResource id url stream_url embed_string media_service_type_id // bandcamp, soundcloud, youtube, etc. media_type_id // container, album, track, playlist created_at updated_at spidered_at status

MediaAlbum id url stream_url media_resource_id

MediaTrack id url stream_url media_album_id media_resource_id

MediaPlaylist (aka set) id url stream_url media_resource_id

MediaTrackPlaylist media_track_id media_playlist_id order

MediaAlbumEntity MediaTrackEntity MediaPlaylistEntity MediaAlbumEvent MediaTrackEvent MediaPlaylistEvent

geoff-maddock commented 1 year ago

Categorize all forms of bandcamp links:

geoff-maddock commented 1 year ago

Make sure I have all these covered