erc-dharma / project-documentation

DHARMA Project Documentation
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

Use of `<ref>` #298

Closed michaelnmmeyer closed 2 months ago

michaelnmmeyer commented 2 months ago

A minor remark I am not sure I made: when referring to inscriptions with <ref> (EGD §10.4.6.Referring to inscriptions in the DHARMABase), the use of @n for indicating a repository is not needed (because all texts share a single namespace), and adding the .xml extension is also unnecessary.

Thus

<ref n="tfa-pallava-epigraphy" target="INSPallava00001.xml">Pallava 1</ref>

can be written

<ref target="INSPallava00001">Pallava 1</ref>

Internally, all variations DHARMA_INSPallava00001, DHARMA_INSPallava00001.xml, and INSPallava00001, INSPallava00001.xml are made to point to https://dharmalekha.info/texts/INSPallava00001.

danbalogh commented 2 months ago

Thanks, I have not heard this before. I'm very happy about lifting the part about using n to identify the repository. As for the filenames, I think it would be better to keep them systematic and use the full prefix all the time, perhaps also the .xml extension. That way there is, I think, a better chance of interpreting the files if someone else comes across them years from now and does not necessarily have access to your processing. If you disagree with that and say that it is technically not a problem, then I accept that, but even then, I would prefer to have just a single "legitimate" way to encode such references. I don't think it's good to have a guide that says "you can do A, B, C or D and it doesn't matter which, so just pick whichever you like at the moment". So if you are sure that using the full filename (DHARMA_INSPallava00001.xml) is not better practice or more more future-proof in any way than the others, then let's pick the simplest form (INSPallava00001) as the only approved solution; otherwise, let's use the full filename and make the schema flag anything else as an error instead of processing it to make it work.

Whichever is chosen, the schema may need some alterations. At the moment, @n still comes up in Oxygen as a permitted attribute for <ref>, and I think there are no other circumstances in which one would want an n on a ref. So that can be discarded, and the existing instances of n on ref can be deleted from the files. For ref, I also get a suggestion list consisting of the filenames in the same folder as the present file, which I like, since most of my crossreferences are to my own subcorpus. But if we enforce a short reference instead of the full filename, then this needs to be changed.

michaelnmmeyer commented 2 months ago

Using the full file name is indeed probably the safest bet, if only for the autocomplete feature you pointed out. This might also discourage people from inventing file names; I am seeing a lot of names like AirAsih.xml, WanuaTengahIII, etc., that cannot be resolved to a real file.

danbalogh commented 2 months ago

Right. Shall we then agree that <ref target="DHARMA_INSPallava00001.xml">Pallava 1</ref> will be the only acceptable form of encoding a reference to a DHARMA inscription edition?

michaelnmmeyer commented 2 months ago

OK, perfect.

arlogriffiths commented 2 months ago

Curiously, I had never been aware of the tehoretical obligatuon to use @n on <ref> and in blissful unawareness been encoding as per the result of your discussion. I have no objections.

@michaelnmmeyer : can you extract a list of non-compliant names like AirAsih.xml and WanuaTengahIII.xml, so I can fix them? I hope you are aware that in general all file names bearing the string IDENK are not yet FNC-compliant in that they use inscription names rather than numbers, as temporary solution while we are waiting for the IDENK database (idenk.net) to deliver inscription numbers for these items. This will start to happen within the next half year, I hope.

michaelnmmeyer commented 2 months ago

@arlogriffiths

Here is the list of references. It is probably too long to be useful, though.

danbalogh commented 2 months ago

BadamiCalukya00004-Kopparam-Pulakesin.xml was in my corpus, now corrected.

arlogriffiths commented 2 months ago

Thanks. I have converted the above list to a task list with check boxes and started weeding out offending cases listed above. @ryosukefurui @ekobastiawan @tyassanti @chhomkunthea @salomepichon @chloechollet @wayanjarrah : Please read the above discussion. Then please help make our files compliant to the precise rules for use of <ref>. Search cases of offending strings using the "Find/Replace in Files" function, choosing the appropriate repository where the cases is suspected to occur. Correct the relevant file and check the item in the list above.

Examples of correct references for the tfc repositories:

<ref target="DHARMA_INSCIK00011.xml">K. 11</ref> <ref target="DHARMA_INSCIC00017.xml">C. 17</ref> <ref target="DHARMA_INSIDENKWintangMasB.xml">Wintang Mas B</ref>

Don't hesitate to ask if anything more needs to be explained.

Capture d’écran 2024-05-06 à 11 51 23
ekobastiawan commented 2 months ago

@michaelnmmeyer : Do you know why I can't tick the boxes above?

michaelnmmeyer commented 2 months ago

@ekobastiawan I have no idea. This might require an administratror account.

arlogriffiths commented 2 months ago

I have added @ekobastiawan among assignees. Can you try again now, Eko?

If that too fails, we will need to split up the above list and create separate list per repo. But even on tfd-nusantara-epigraphy, does Eko have administrator rights?

danbalogh commented 2 months ago

I am able to tick and untick. I'm assigned, but not an admin as far as I know. So I guess Eko should fine now that he is assigned.

ekobastiawan commented 2 months ago

@arlogriffiths : I still can't tick the boxes

arlogriffiths commented 2 months ago

I suspect the problem has to do with other-than-admin-level access to the repo, which Dan does but Eko doesn't have.

Can you look into this, @michaelnmmeyer? Can we do something about it?

danbalogh commented 2 months ago

Sounds logical. I'm afraid I have no idea how to check my level of access.

manufrancis commented 2 months ago

@ekobastiawan Make sure you have sign in to your github account

michaelnmmeyer commented 2 months ago

@ekobastiawan I gave you write access to the repo.

ekobastiawan commented 2 months ago

@michaelnmmeyer Thanks a lot, I am now able to tick the boxes

chhomkunthea commented 2 months ago

Dear all,

I cannot check the boxes in the list above. I think that I am already signed in the Github. Maybe I have not been given access or was at the wrong place. Can you please help?

Best, Kunthea

salomepichon commented 2 months ago

Dear all,

I've for now made the modifications for the cam corpus. I haven't been able to locate the cases of C0087.xml and C0096.xml, however.

arlogriffiths commented 2 months ago

@michaelnmmeyer :

  1. can you give @chhomkunthea write access in the same way that you did for Eko?
  2. like @salomepichon, I am unable to find cases of C0087.xml and C0096.xml in the tfc-campa-epigraphy repo. Where should we be looking?
chhomkunthea commented 2 months ago

Dear all,

There is also problem in my files (K. 11, K. 56, K. 77, K. 417 and K. 582). Among them, only K. 56 has a <ref target="DHARMA ...> markup. And I don't see the K. 136.xml in the folder "xml-provisional".

Actually, there are files, especially the hospital inscriptions of Jayavarman VII (K. 12, K. 368, K. 375 ...) which contain many markups. They conform to the norm, i.e. without the @n.

Best, Kunthea

arlogriffiths commented 2 months ago

I think you may have misunderstood the nature of the list above. It is not a list of files to be opened and checked, but a list of strings to be searched (in your case in tfc-khmer-epigraphy) and to be replaced by the correct string. For example, if you use "Search/Replace in files" to search the string K0379.xml, you will find one occurrence, namely in the file DHARMA_INSCIK00216-S.xml. In that file, you need to replace <ref target="K0379.xml">K. 379</ref> by <ref target="DHARMA_INSCIK0379.xml">K. 379</ref> and then tick K0379.xml in the list above. Is it clear now?

chhomkunthea commented 2 months ago

Yes, it is. Thank you!

It seems that there is one zero missing in the file name. Should it be "DHARMA_INSCIK00379" instead of "DHARMA_INSCIK0379" ?

arlogriffiths commented 2 months ago

Indeed, small typo from my side. Sorry. Please do add that zero.

chhomkunthea commented 2 months ago

Well noted with thanks.

chhomkunthea commented 2 months ago

Dear all,

FYI, I have corrected the related to K0011 through K1284 in the list above. I hope that they are all fine now.

Best, Kunthea

michaelnmmeyer commented 2 months ago

@chhomkunthea You should now be able to tick boxes.

chhomkunthea commented 2 months ago

Thank you very much! Yes, it's done now.

arlogriffiths commented 2 months ago

@michaelnmmeyer : could you help us track down C0087.xml and C0096.xml?

https://github.com/erc-dharma/project-documentation/issues/298#issuecomment-2101934409

michaelnmmeyer commented 2 months ago

@arlogriffiths They have been corrected in the meantime.

arlogriffiths commented 2 months ago

Thanks.

@michaelnmmeyer : can you tell me where to look for Dk0019.xml and Dk0020.xml?

@ryosukefurui : all remaining items concern tfc-bengalcharters-epigraphy: can you take care of them?

ryosukefurui commented 2 months ago

I have just corrected relevant ref in DHARMA_INSBengalCharters00065.xml, and ticked the list. Excuse me for a delayed response.

michaelnmmeyer commented 2 months ago

@arlogriffiths

Dk0019.xml and Dk0020.xml are both in tfb-daksinakosala-epigraphy/texts/DHARMA_INSDaksinaKosala00021.xml

arlogriffiths commented 2 months ago

Thanks. So the remaining work for @NatasjaSB. I don't know if she is still following github, and anyhow I assume @danbalogh can easily make the small modifications in her xml files on her behalf.

So @danbalogh, could you take care of this and then close this issue?

danbalogh commented 2 months ago

I've made the correction in the DaksinaKosala file. Natasja has recently renamed her files at our request, to follow the pattern used in other collections (INSDaksinaKosala instead of INSDk), and I assume that she did not think to check for and update existing references to files when she did that rename. Her repository seems to contain no other obsolete references.