boun-tabi-LMG / turkish-academic-text-harvest

MIT License
2 stars 0 forks source link

Handle page numbers concatenated with authors, journals, issues, etc. #6

Closed gokceuludogan closed 1 year ago

gokceuludogan commented 1 year ago

The script currently faces challenges handling cases where page numbers are concatenated with authors, journals, issues, and other information. We need to implement a mechanism to correctly identify and separate page numbers in such situations.

furkanakkurt1335 commented 1 year ago

I have changed the count_occurrence function to return the maximum count between when the target line is directly used to count in the lines and when the leading or trailing digits are removed. The commit is e17f6e1.

furkanakkurt1335 commented 1 year ago

I assume this is to get rid of them at the end: page numbers, as well as paper metadata.

furkanakkurt1335 commented 1 year ago

This was solved in 0fe1a15.