Closed quentinguidee closed 4 months ago
Also, I think a lot of requests can beneficiate from this improvement. If everything looks fine I might try to improve the others.
I just realised that the request is sorted by listening count, not total duration, so we can directly limit/offset at start, which would make the request instantaneous
Edit it seems to work (0,2s), but the total_count is not calculated correctly, so I reverted to the 1,2s version
Hello! Thanks a lot for what you've done. It seems good to me.
The facet is a bit weird since I'm not used to it.
Would be interesting to test if it outputs the exact same information.
This performance boost is insane thanks a lot! I don't want to abuse or anything but yeah it would be very good to have a brand new look on other queries too.
Maybe (surely) getBestArtistsNbOffseted
and getBestAlbumsNbOffseted
would benefit from this too.
I've tried working on #162 at some point but did not find any better implementation back then.
Again, thanks a lot. Don't hesitate if you have any questions, and keep in touch when you feel like you're done with this :)
Hi!
Would be interesting to test if it outputs the exact same information.
They seem to have the exact same result in my case! However, it may be good to test again before merging with other data!
I'll try to improve the PR in the next days to cover more improvements ๐
I'll try to improve the PR in the next days to cover more improvements ๐
Thanks a lot
With the same technique, https://github.com/Yooooomi/your_spotify/pull/348/commits/5669865099a6981c83c03c2d88bc878b0b56f6f6 makes the /top/artists request faster, from 12.1s to 1.3s
For this request however:
differents
field, it seems to come from other requests like differentArtistsPer
.And to conclude, /top/albums from 12s to 1.4s ๐
Should be ready for review! ๐
Hello @quentinguidee, congrats for the optimization it's pretty impressive ๐ If you have time, could you please have a look at the new mongoDb requests added in my PR https://github.com/Yooooomi/your_spotify/pull/346 to see if it can also be optimized ?
@Yooooomi I have a MongoDB question. I think that almost every requests in this project could be instantaneous if we could have duration_ms
, track_id
, album_id
, artists_id
directly indexed in the InfosModel. I don't see another way to do this than adding these fields to all InfosModel. Do you know any way to achieve this automatically with MongoDB?
I can write a migration that adds these fields in the info model. I'll be working on it later today if this can help you.
Thanks! That would be helpful. Also this might open the path for more precise total duration when storing each listening duration separately
Also this might open the path for more precise total duration when storing each listening duration separately
The Spotify API does allow to get the specific listening duration for the recently played tracks.
Yeah unfortunately, but this might be approximated with the time elapsed between two entries in the history? And by default set it to the track duration?
Also, iirc the spotify data when downloaded include the duration of each listening
Spotify history is sometimes not precise, relying on timestamps to establish listening duration might be dangerous. I'm pushing the migration to release/1.8.0 branch. You will be able to rebase/merge from there to have access to albumId, artistId and durationMs in the Infos model.
It's pushed to release/1.8.0
@Yooooomi It seems that the migration does not trigger, is that because the timestamp of the migration is wrong? (probably the duplication of vscode which incremented the timestamp by one). I tried to npm migrate manually but it says its migrated (but the infos don't have the new fields)
It did trigger on my instance. Are you sure you have the latest code built?
Oh the migration was marked as done in the collection "migrations", but was not done for a reason I don't understand. I did a mongorestore and restarted, and it worked
Yeah that works really well, I achieve a request of ~300ms instead of 12.1s with the new fields ๐ And the requests are also simpler in the code (we can extract almost everything to a new function)
InfosModel.aggregate([
{ $match: basicMatch(user._id, start, end) },
{
$group: {
_id: "$id",
duration_ms: { $sum: "$durationMs" },
count: { $sum: 1 },
},
},
{
$facet: {
infos: [
{ $sort: { count: -1, _id: 1 } },
{ $skip: offset },
{ $limit: nb },
],
computations: [
{
$group: {
_id: null,
total_duration_ms: { $sum: "$duration_ms" },
total_count: { $sum: "$count" },
},
},
],
},
},
{ $unwind: "$infos" },
{ $unwind: "$computations" },
{
$project: {
_id: "$infos._id",
result: {
$mergeObjects: ["$infos", "$computations"],
},
},
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: ["$result", { _id: "$_id" }],
},
},
},
{ $lookup: lightTrackLookupPipeline("_id") },
{ $unwind: "$track" },
{ $lookup: lightAlbumLookupPipeline() },
{ $unwind: "$album" },
{ $lookup: lightArtistLookupPipeline() },
{ $unwind: "$artist" },
]);
The $facet
is a way to make two different data process from the same data source. So the first part takes only the most listened tracks, while the second takes the total count and duration from all tracks. Then both are merged
Nice job! I don't remember why we compute total_duration_ms. This feels weird when we just take a subpart of the whole thing.
I think this is for the percentages here:
But if we remove them the request should be extremely fast
I feel like we could do this in a request on the side. So that this request does not have to compute this for every page.
I just pushed a new file architecture to the branch so that it follows modern monorepo architecture. You might encounter issues. I would suggest you merge the 1.8.0 to your branch to see if there are any merge conflicts.
Thanks! Everything looks fine, the 3 requests are almost instantaneous now. I have also refactor the 3 to use the same base request. I think the "differents" field and some lookup can be removed depending on the request (tracks/artists/albums) but that doesn't change the time so clean code is better imo for now while we recreate the requests
I'll keep this PR restricted to these 3 methods for now so we can iterate faster, so this one is ready for review ๐งช
Hello, this seems fine to me. Does it change the signature fontend? It seems the objects are not looked up anymore so the frontend cannot display information am I right? Edit : didn't see the lookup pipeline my bad. Seems fine to me, going to merge this evening to test.
OK OK OK this is BLAZING fast. Got from 20.48s down to 0.81s on M1 Pro, I'll update this with numbers from intel i7. Infos mongo collection is 10 times bigger though but still only 41MB (32MB of indexes). Huge work thanks a lot. Can't wait to see how it can be applied to other requests. Many users will thank those performance improvements.
Yeah that changes entirely the feeling of the app! The tradeoff with document databases compared to relational db is that join requests are really bad and should be avoided as much as possible. Without that the performances are extremely good! I'll work on the other queries really soon, I have a lot of free time at the moment.
For the 30MB of indexes yeah that was predictable, but I think it is nothing compared to the space it took in RAM during the giant request (but I don't have benchmarks for that).
While requests are performant for many use cases, I have around 225000 listened songs (23000 different) with Your Spotify hosted on a Raspberry PI, so navigate on the client takes a lot of time, especially with the "Last year" or "All" timeframe.
This PR improves performances of
getBestSongsNbOffseted
,getBestArtistsNbOffseted
andgetBestAlbumsNbOffseted
by a factor of ~10. My approach was to group allInfos
of the same track at start. This avoids to compute many times the sameInfo
, which helps a lot for thelightTrackLookupPipeline
.Benchmark
done on mac m1, not on the Raspberry
A request for the top songs with date range "All" before this PR takes in my case
11,8s
:After this PR, this time drops to
1,4s
, which makes the page loads almost instantly for the biggest request.Update: See below for another performance boost (https://github.com/Yooooomi/your_spotify/pull/348#issuecomment-1967481756)
For reviewer
$project: { ...getGroupByDateProjection(user.settings.timezone), id: 1 }
step. I'm not sure what it does exactly. However, we can add it back if necessary.Related
232, #162