Princeton-CDH / geniza

version 4.x of the Princeton Geniza Project
https://geniza.princeton.edu
Apache License 2.0
11 stars 2 forks source link

Data cleanup for sources of unpublished transcriptions marked with languages (must be unspecified) #1464

Open kseniaryzhova opened 1 year ago

kseniaryzhova commented 1 year ago

In order to help you all maintain consistency across the DB, I've come up with a handful of lists for data cleanup. We can make a new chore issue if that's helpful. I imagine most of these that are unpublished will need the new "Unspecified" language option, but don't want to assume that and programmatically reassign all of them if that's incorrect for any.

Sources with no language selected (from before the constraint was added; note that all of these happen to be unpublished) Sources marked as Judaeo-Arabic (I assume all of these should be reassigned?) Sources marked as Ottoman Turkish (this too) Unpublished sources marked as Hebrew Unpublished sources marked as Arabic

It's possible there are other examples of non-unpublished editions incorrectly marked as Hebrew (I checked Arabic and didn't see any), so it's probably worth going through ALL the sources marked as being in Hebrew and checking them individually:

All sources in Hebrew

blms commented 11 months ago

@kseniaryzhova This one is now good to go, whenever you have time to look through these, as the new "Unspecified" language option is live in production.

richmanrachel commented 7 months ago

@blms and @kseniaryzhova - is this ready to close?

kseniaryzhova commented 7 months ago

@richmanrachel I'm keeping it open as a reminder to myself, but Ben's portion of the work is done, it's now in our hands. Lower priority given everything else though.