Yooooomi / your_spotify

Self hosted Spotify tracking dashboard
GNU General Public License v3.0
3.09k stars 124 forks source link

Performance improvements #232

Closed lorenzopicoli closed 7 months ago

lorenzopicoli commented 1 year ago

Hello, thank you so much for the project!

I'm running the server for me and a few friends. I've noticed that the performance when checking a large time period is not really amazing. I did an extended history import with 144189 songs. 4 other friends did the same with the similar amounts on average. If I check my yearly stats it takes several seconds to even load the page while my machine is using 100% of its CPU. For all time it takes even longer than that, some times even more than a minute. I can see that this is the result of very long queries.

I've tried running it both on a raspberry pi 4 and on a Hetzner instance with similar results.

Queries like these are very slow to run:

db.collection_name.aggregate([
  { 
    $match: { 
      owner: ObjectId("6418b3970362c4b75eb3aa2c"), 
      blacklistedBy: { $exists: 0 }, 
      played_at: { 
        $gt: ISODate("2015-01-11T20:01:15.000Z"), 
        $lt: ISODate("2023-04-17T04:12:12.007Z") 
      } 
    } 
  }, 
  { 
    $project: { 
      year: { $year: { date: "$played_at", timezone: "America/Toronto" } }, 
      month: { $month: { date: "$played_at", timezone: "America/Toronto" } }, 
      day: { $dayOfMonth: { date: "$played_at", timezone: "America/Toronto" } }, 
      week: { $week: { date: "$played_at", timezone: "America/Toronto" } }, 
      hour: { $hour: { date: "$played_at", timezone: "America/Toronto" } }, 
      id: 1 
    } 
  }, 
  { $lookup: { from: "tracks", localField: "id", foreignField: "id", as: "track" } }, 
  { $unwind: "$track" }, 
  { $group: { _id: { year: "$year" }, count: { $sum: "$track.duration_ms" } } }, 
  { $sort: { "_id.year": 1 } }
])

I'll probably take a look into these issues myself at some point, but I'm not super comfortable with NoSQL databases so I'm not sure I'll get good results. Are there any plans from the maintainer to tackle these performance issues?

Yooooomi commented 1 year ago

Hey! I'm sorry you're having performance issues. The query you use as an example can't get much smaller. The statistics are built on demand that's why some require a bit more compute. Which version of mongo are you using?

lorenzopicoli commented 1 year ago

Yeah I can come back at some point this week with some better benchmark on what queries are taking long. I'm not too familiar with NoSQL but maybe there are some indexes (or equivalents?) that we could add to improve these checks?

I'm still on version 4. I saw that it was upgraded to 6 in the past few days. I'll give it a try and see if magically fixes it hahaha

Yooooomi commented 1 year ago

Since upgrading is not officially documented, you should follow the comment at https://github.com/Yooooomi/your_spotify/issues/223#issuecomment-1510380689

HeyBanditoz commented 1 year ago

Is this something that could be improved if we just treat the data as relational? And use a SQL database instead, like Postgres.

lorenzopicoli commented 1 year ago

I've upgraded to mongo 6 and it's still very slow when loading all data. It does run faster because I can't run mongo on my raspberry pi so I've moved to a better machine.

I love this project and would love to help contribute whenever I get some time. Would moving away from NoSQL be something you'd be open to @Yooooomi?

Yooooomi commented 1 year ago

Hello. I'm glad mongo 6 is a bit more performant. As for the sql question, I actually have no idea how it would perform. I feel like mongo might not be the best choice there but I don't know what could be better. I chose mongo years ago when I had a little bit less knowledge. It allows very easy data manipulation and construction. I'm afraid that with Postgres things would not be easy at all. Things are to be continued for sure but this will probably be a big deal.

HeyBanditoz commented 1 year ago

Maybe indexes would work? It seems mongodb supports them: https://www.mongodb.com/docs/manual/indexes/ Would you add the db scripts to the migrations folder to add them?

Would definitely be an easier option than shifting entire DBMSes

Yooooomi commented 1 year ago

I am already using indexes. If I was not queries would take several minutes instead of 5 seconds. I'll investigate a bit on the subject.

Karakazz commented 1 year ago

I just upgraded to mongodb 6.0 and performance improved by a lot. I had to run the follwing inside the mongo container after upgrading to 5.0.16: db.adminCommand( { setFeatureCompatibilityVersion: "6.0" } )

After that I was able to upgrade the db with no issues to 6.0

Yooooomi commented 1 year ago

Upgrade is done automatically, you didn't need to do the command with the last nightly.

lorenzopicoli commented 1 year ago

After a week running on mongo v6 I can confirm that there's some performance improvement. Mongo CPU usage is still really high and requires a good host to be able to load all time stats, but it's usable for now

Karakazz commented 1 year ago

Upgrade is done automatically, you didn't need to do the command with the last nightly.

I'm currently utilizing a version that has undergone some significant modifications and is based on a previous release, which didn't update automatically. I tried it though! :D

The performance has significantly improved after a week of testing. Currently, my "account" contains 173k songs out of a total of approximately half a million songs, and loading "All" on "All Stats" now only takes around 20 seconds, as compared to a couple of minutes previously.

filcuk commented 1 year ago

Also consider SQLite as an option. It's often greatly underestimated and should be simple to implement. Even Wordpress is switching to it.

Yooooomi commented 1 year ago

Hello, I think SQLite is nowhere near the performances of MongoDB for this kind of treatment. Wordpress does not switch to it but allows websites created in wordpress to use it which is a huge difference. I tried to migrate to postgresql in the past, but I feel like nosql is better for the moment. I am not myself a pro considering databases, but I feel like I need a database built for realtime computing and aggregation of data, which is not the case of postresql nor sqlite.

NB: all of this should be possible (certainly is) using a postgresql database, but boi I don't want to write those 200 lines queries.

quentinguidee commented 7 months ago

Fixed by #348, #354 and #355 (release 1.8.0)