Kareadita / Kavita

Kavita is a fast, feature rich, cross platform reading server. Built with the goal of being a full solution for all your reading needs. Setup your own server and share your reading collection with your friends and family.
http://www.kavitareader.com
GNU General Public License v3.0
5.63k stars 282 forks source link

Statistics pages not including archive chapters estimated under 1hr time in reading statistics #2961

Open Cducharme84 opened 1 month ago

Cducharme84 commented 1 month ago

What happened?

Noticed that despite reading over 40 series totaling over 6100 read pages both user and statics pages showed 2 hours total read time. All series and chapter estimates show correctly on the series page so it’s being accurately accessed prior to read state.

Upon digging into the db I noticed all the read time values appear to be integer based counting by hour so anything calculated at under an hour is a 0 at all places containing the values, for series/volume tables this leads to intended display in the UI estimate on the series page of a rough by hour based estimate.

For chapters not totaling an hour estimated read time they are recorded as a zero for read time calculations including the statistics pages.

Dup on discord helped identify that in https://github.com/Kareadita/Kavita/blob/97ffdd097504ff9896f626bc7e0deb0c6e743d9d/API/Services/StatisticService.cs the section that contains the following logic to discard chapters with a read time value of zero when calculating page count for statistics display appears to be at fault.

.Where(p => p.chapter.AvgHoursToRead > 0)
.SumAsync(p =>
     p.chapter.AvgHoursToRead * (p.progress.PagesRead / (1.0f * p.chapter.Pages))));

In my case the discrepancy is 11 days estimated read time based on archive page read count of 6100+ pages and the display removing chapters under 1 hour estimated read time from the page count used for the calculation.

Looking at the highly scientific data gathering found in the show your server channel on discord those who mention primarily reading comics but reading frequently seem to have lower than I would expect calculation in their screens. It may also be causing manga chapters under 1 hr to be excluded in the calculation too, but I have less experience in the realm of typical manga archive size so that is conjecture.

Possible solutions:

  1. Remove the exclusion of chapters under an hour from the calculation for statistics display. I do not know if this logic was added for purpose so while simple sounding may be some refactor involved if that was to accommodate something outside of statistics.
  2. Use the low value identified for minutes per page for archive based completed chapters regardless of estimated reading time if the above exclusionary logic is needed for other types of books in statistics calculation.
  3. Refactor estimates to float to allow decimal based storage which involves database schema change, this option I do not see being attractive or even needed but may allow for more freedom in non-statistics page read estimate displays. This implementation would honestly be best looked at in a FR if this was the way the team leaned towards to gauge worthwhileness just waned to present my thought of solutions.

What did you expect?

All completed chapters to have page count included with reading stats

Kavita Version Number - If you don not see your version number listed, please update Kavita and see if your issue still persists.

Nightly Testing Branch

What operating system is Kavita being hosted from?

Docker (Dockerhub Container)

If the issue is being seen on Desktop, what OS are you running where you see the issue?

None

If the issue is being seen in the UI, what browsers are you seeing the problem on?

No response

If the issue is being seen on Mobile, what OS are you running where you see the issue?

None

If the issue is being seen on the Mobile UI, what browsers are you seeing the problem on?

No response

Relevant log output

No response

Additional Notes

Attached is a user with the reading statistics way off IMG_0309

DieselTech commented 1 month ago

To add to this, which I think is likely related, the server stats don't really line up with what it expected.

image

It's stating total read time is 5.5 days for over 100k files (Which is also wrong. I have about 100k comics alone, then about 40-50k manga across a few different libraries). The total size should be approx. 4.3TB for everything added to Kavita. I have some manga series that on their own are 7-10 days of read time alone.

Cducharme84 commented 1 month ago

Oh yeah, my first paragraph I meant user and server statistics pages. Since mine matches 2hrs on both screens I had assumed the server stats screen was displaying all users added together for read time, it just happens in my case my other user hasn’t read much on this db instance and was the same.

majora2007 commented 1 month ago

This seems to be an oversight from building stats on top of the estimated reading time feature and is a great find. Unfortunately this requires a DB migration and a bit of rework to the codebase. I'll try to get to this in v0.8.3.