Closed Leeingnyo closed 3 years ago
Thanks for the PR! I was trying to review this but got confused about two points. I am probably missing something so it would be great if you could explain it a bit more.
@value.size * (instance_sizeof(String) + sizeof(String)) + @value.sum(&.size) + instance_sizeof(SortedEntriesCacheEntry)
? I don't understand why you are adding instance_sizeof(String)
and sizeof(String)
together, and why you do @value.sum(&.size)
.cached_xx
and cached_xx_previous
? And when clearing the cache why do we do it in two steps (clear and clean)?instance_sizeof(SortedEntriesCacheEntry): sizeof(Time
) * 2 + sizeof(String
) + sizeof(Array(String)
) + sizeof(object id)
Since String
and Array
are a Reference
type, they are somewhat pointers. Their sizeof
return sizeof pointer (4bytes or 8bytes). I tried to calculate their actual data size. I thought @value
, which of type is Array(String)
, would have @value.size
of String
, so I added them (@value.size * sizeof(String)
). Again, the String
is a reference type, I added instance_sizeof(String
) too. In String
, there would be a pointer that point to memory allocated for string data. So I added @value.sum(&.size)
for allocated memory.
It's not perfectly accurate (I forgot to add instance_sizeof
and data of @key
, and there would be other stuff), but I think this would be close to actual size of memory consumption. Let me know if there is already fancy method to do this in the Crystal
After scanning, entries and titles could be added or removed. We should invalidate cached_xx
for removed entries and titles and preserve cache for remained ones (I intend this).
At clear
stage, it backs up cached_xx
to cached_xx_previous
and initializes cached_xx
.
In scanning process, titles and entries call to move(=preserve) their cached data from cached_xx_previous
to cached_xx
(entries and titles).
After scanning, the clean
stage initialize cached_xx_previous
, hoping to be GCed.
I think my naming sense is not good 😢
Ah now it makes more sense. Thanks for the clarification!
String.size
returns the number of unicode characters in the string, so the value might not be equal to the actual bytesize, but I guess we can assume that we will only be storing ASCII characters.
I think the SortedEntriesCache
class can be easily generalized to a general in-memory key-value DB to cache other metadata as well. In fact I think we can implement the same two-stage clearing method you mentioned into it and store the title info data in it as well. In this way we can have a size limit that applies to all types of caches. What do you think?
I will test the current implementation and try to improve the naming a bit. I can then start exploring the idea above.
I just notice there is the String#bytesize
to get real byte size!
I think the data cached by InfoCache
(i.e. display_name, sum of deep titles progress, cover_url) are small enough to stay in a memory. they are just additional three or four variables on each titles and entries and actually I want them not to be invalidated. but, yes, in other words, they are small to be cached in LRU cache system.
I agree that generalized cache entry would be good. Let me try to do. It takes a time learning the Crystal 😄 I'll ask you a help when having problems.
By the way, I checked the total size of info.json
from my huge library (12000 entries, 500GB, 1 user), which is only 1MB (as text files, a info.json
of title which has 1000 entries has the size about 80KB). It seems okay that whole info.json
files could be cached in a memory. How about using the generalized LRU cache at TitleInfo#new
?
Haha I didn't know about String#bytesize
either but it's definitely a better choice.
The only problem I have with the current implementation is that we have two cache systems InfoCache
and the LRU cache, while the LRU cache can be easily extended to replace InfoCache
. I think having a single unified class for caching will make future maintenance and development easier.
Yes I agree that we can cache the metadata into the generalized LRU cache at TitleInfo#new
. Since the generalized LRU cache will be a key-value store, we can use keys like [id]-coverurl
and [id]-progress
to store and query the data. For SortOption
we can serialize it to JSON and cache it.
Let me try to do. It takes a time learning the Crystal 😄 I'll ask you a help when having problems.
Sure looking forward to it! As always, thank you for the effort you put into this :D
I Fixed my implementation
LRUCache
This is general LRU Cache that is key-value store. the value is contained in CacheEntry(SaveT, ReturnT)
for handling access time.
On CacheEntry(SaveT, ReturnT)
, SaveT
is a type of storing in the cache and ReturnT
is a type of value that return value of get
. e.g. SaveT
of cache entry for sorted_entries
is Array(String)
, while ReturnT
of that is Array(Entry)
. If you have a cache entry of which SaveT
and ReturnT
are different each other, extend CacheEntry(SaveT, ReturnT)
and override self.to_save_t
and self.to_return_t
. There are aliases, CacheableType
and CacheEntryType
. When you want more cacheable type, add them to these alias.
the generate_cache_entry
method helps to make a cache entry easily.
It's a return value of get
should be checked by is_a?
.
I've cached
Tuple#instance_size
is not implemented well.CacheableType
and CacheEntryType
(?)Thanks! The code looks great overall, and I just have one question - why do we need to cache the sort options and cover URLs separately when we already have the TitleInfo
in the cache? You can easily get the JSON from the cache and then retrieve the information you need. This would be a little bit slower but I think the difference would be marginal. I am missing something?
auto-generatable CacheableType and CacheEntryType (?)
Perhaps we can somehow achieve this with the Crystal inherited
macro. I will look into this.
I agree that caching sort options and cover URLs separately to LRU cache is verbose. So I removed them and tested it, It took about 2ms to restore cached info.json
string to JSON object of which title has about 220 entries, and took about 10ms more to load title page. I think this is fair enough.
I'm going to cache the cover_url
in title instance like the cached_display_name
. Because, for getting cover URLs of its nested titles, it retrieves info.json
of titles. I think a member variable cached_cover_url
is much cheaper.
Again thanks a lot for the PR! I took the liberty and made some small changes.
Great changes! Thanks for merging
Implementations
To improve page loading, I cached lots of data. The principles are:
info.json
filesMisc.
cache entry_cover_url
Like
entry_display_name
, cachedentry_cover_url
would improve renderingsrc/views/components/card.html.ecr
. This is very effective.cache
display_name
of titleThis would avoid to access
info.json
change where sort opt are saved
It's okay that sort options are saved when accessing library, book with parameters. I removed unnecessary actions. This would avoid to access
info.json
Caching Library
This utility implemented at
cache.cr
helps cached data to be preserved after scanning. This caches:cover_url
of each entrydeep_read_page_count
of each titleinfo.json
filesI think the second one is quite effective.
Sorted Entries Cache
There are lots of calls
sorted_entries
, which takes long (222 entries took 80ms on my machine). Besides it called three times with the same arguments when accessing title page. Caching them is effective since they are not changed until modifying entries. Currently, this option is not enabled as default. You should set the flag true for enabling. And the cache size option doesn't guarantee the machine to consume memory precisely.Effectiveness
Test environment
I loaded a title page that has 222 entries (no nested title, about 11GB) on SATA3 SSD. I'm sorry that I have tested them only once
As is
first access: took 1140ms
second access: 1119ms (not changed)
took long, almost same time
With only caching entries' cover url
first access: 700ms (x1.6 faster)
second access: 735ms
With full implementation except sorted entries cache
first access: took 233ms (x4.9 faster)
second access: took 238ms
With full implementation
first access: took 115ms (x10 faster)
second access: took 50ms (even faster)
This resolve #186 partly.