pdf file indexing - Githubissues

GamerGirlandCo commented 1 year ago

i'm very much aware that the implementation is less than ideal in some places, so if anybody has any suggestions for improvement, please let me know. or you can close this PR and tell me to fuck off. i won't be offended.

blacksmithgu commented 1 year ago

Please keep contributing, your help has been great :)

blacksmithgu commented 1 year ago

As for the code, it broadly looks okay to me; I can see there are some schenanigans in the import worker to get the pdf import code working though that seems like the PDF library's fault. You may want to move the hackery to a separate standalone function (or into the pdf code) to better isolate it.

Additionally, since we are adding in some more file types, we may want to cleanup the code a bit since we are checking the types in multiple places. Maybe adding some static functions for checking if a given file type is "indexable" (i.e., markdown/pdf/canvas) and using that instead of explicitly checking the extension in each place.

blacksmithgu commented 11 months ago

I'm going to adjust the file type logic a bit to make it more general so it's easier to add other file types (images etc) soon.

blacksmithgu / datacore

pdf file indexing #34