erc-dharma / tfc-khmer-epigraphy

This repository assembles data produced by the project Corpus des inscriptions khmères (before and during the DHARMA project).
https://dharma.hypotheses.org/
Creative Commons Attribution 4.0 International
2 stars 0 forks source link

Doublons #37

Closed michaelnmmeyer closed 1 year ago

michaelnmmeyer commented 1 year ago

Duplicated files under texts/xml-provisional/doublon-to-check should be excluded from the repository (by adding this directory in .gitignore, for instance). Otherwise it's impossible to tell which files are supposed to be validated and which are not.

arlogriffiths commented 1 year ago

@michaelnmmeyer — this folder was created recently by @ajaniak when (at my request) she restructured the contents of the repo, which were previously organized per encoder. in the process, she determined that a small number of inscriptions had been encoded twice. see this commit.

our encoders @chhomkunthea, @chloechollet and @salomepichon are working on undoubling the files and will inform you once the doublon-to-check folder is ready to be deleted.

michaelnmmeyer commented 1 year ago

@arlogriffiths I understand. My observation is purely technical: I have a program that examines all repositories to find the texts edited so far and that performs some sanity checks, like checking for uniqueness. I made it ignore duplicate files for now.

chloechollet commented 1 year ago

The files have been undoubling by me and @chhomkunthea. I deleted the doublon-to-check folder.