Closed ayya-vimala closed 3 weeks ago
Thank you. I'll look into it soon
I found a way to look at the source json.gz files and it looks like the _[0-9]+
segment number extensions have been removed so that is why it doesn't find those.
@angirov What @ayya-vimala describes in the last comment could indeed be a problem in our code that generates the segmentnrs on the dvarapandita project. For Chinese we remove _[0-9]+ since that is folio-specific information which we don't want for Chinese, but appearently we need it for Pali, so this code needs to be adjusted to make sure that we don't run into this problem on Pali files.
I think this is solved now.
can we review this? I don't know if this still applies
I think this issue is solved.
Pali data calculations have to be redone due to two errors
sn, an, and dhp files created problems due to their numbers. This has been corrected in the repository: https://github.com/BuddhaNexus/segmented-pali/tree/master/inputfiles_cut_segments_on_space
the pali parallels calculations have cut off part of the numbers so calculations do not match actual segments. For instance, atk-s0101a:1271_0 up to atk-s0101a:1271_0 are all rendered as atk-s0101a:1271 so then the correct segment can no longer be found. I've been trying to find where the error occurs; in the json.gz files or if it is cut off somewhere during dataloading but my computer is too slow to open the json.gz files so it's hard for me to check.