Open fstachura opened 1 week ago
The main reason I can see is this timeline:
For refs in version N, we take the tokenizer output and tokens that match a def in the database are considered to be refs. We might be catching some defs from version N+1, this depends on timing.
Fixing this would require to have a list of defs in version N available when parsing refs of version N. Those are all defs that come from blobs in version N. This can be expensive to compute. Note that it is not a list of new blobs in version N (those that we just parsed), it can be some blobs that have been parsed since a long time that are still part of version N.
I plan on addressing this as part of #289. This however means the outputs of the old update.py
and the new one won't be exactly identical. This was a property I attempted to keep for easy testing.
To reproduce: