benwbrum / fromthepage

FromThePage is a wiki-like application for crowdsourcing transcription of handwritten documents.
http://fromthepage.com
GNU Affero General Public License v3.0
170 stars 49 forks source link

Page-level categorization #4220

Open Simon-Dirks opened 1 month ago

Simon-Dirks commented 1 month ago

Dear FromThePage team,

Thank you so much for the beautiful project! We are self-hosting an instance at Utrecht University (The Netherlands) with the card index system of the American philosopher Susanne Langer.

Though we're very happy with subject indexing functionality (e.g., to label people, places, etc), I could not find any functionality to similarly assign "categories" at the page/scan-level.

As an example: It would be of major added value for us if we could label "card index dividers", "biographical cards", "shopping lists", etc. I imagine functionality that looks very similar to categorizing subjects, with a multi-select dropdown: image

Is there already something written for this? Or, if not, any pointers on how to get started?

Wishing you all the best!

-Simon

benwbrum commented 1 month ago

Hi Simon,

It's great to hear that Utrecht University is using FromThePage! We'll have to add a new pin to our map of global open-source installations.

Are you using field-based transcription or document-based transcription for the project? If the collection is field-based, I would just add a new field for the type of page.

If you are using document-based transcription--and it sounds like this may be the case, since subject indexing is only available in that workflow--there is really no way to categorize individual pages in the way you describe. That said, if these different lists are divided into separate FromThePage works, you could use metadata description to classify each work by type of document, as Dartmouth University is doing with their Wrangel Island project.

Simon-Dirks commented 1 month ago

Hi Ben,

Thanks for the quick reponse! We are using document-based transcription for the project, and unfortunately each "work" contains many scans in our case. We've structured the collection into works according to the archival box structure (so Box 1 at the Harvard Library became Work/Box 1 in Fromthepage, if that makes sense).

I'm not too familiar with Rails development, but I do have extensive web dev experience. This feature is really important to us, so I would love to give it a shot to see if I can implement something like this. Could you by any chance give me some pointers on where to start?