How to write type - Githubissues

peterdesmet commented 11 months ago

@baskaufs Some context, I created this test repository with @MattBlissett to test how translations could be implemented for the Darwin Core website (without breaking anything there while we test). I also want to build vocabulary pages (such as em) using Jekyll templates, so we don't have to do that with the Jupyter notebooks.

Question about https://dwc.tdwg.org/em/#4-vocabulary: I notice terms have a Type

For https://dwc.tdwg.org/em/#dwcem_e that is written in full: http://www.w3.org/2004/02/skos/core#ConceptScheme
For https://dwc.tdwg.org/em/#dwcem_e001 that is shorted to: Concept

I think it would be best to use one approach for all. What do you prefer (full URL or shortened)?

baskaufs commented 11 months ago

@peterdesmet I think it should be presented as "Concept Scheme". The reason it is not is due to it not being considered as one of the possibilities in the code that generates the page. That is easily fixed and the page can be regenerated.

Before you embark on this effort, you, me, @ben-norton, and @tucotuco need to get together and agree on a plan. Currently, all of the list of terms pages in DwC and AC are built using stand-alone Python scripts (not Jupyter notebooks) that I wrote. All of the Quick Reference Guides (DwC, Chrono, and soon to be Humboldt) are built by other Python scripts that were not written by me, but are maintained by John and strung together by another Python script I wrote. Ben has created a system for generating the Latimer Core pages using Flask.

I recognize that the current Python scripts are not great and they should be replaced by something more generic written by someone who knows what they are doing. But Ben should be brought into this because he is advocating using his Python/Flask scripts that he has already written for everything. I would like to be involved in this discussion because there is a sequence of events that needs to be followed to make the human-readable pages mesh with the machine-readable data. I made a flow chart about this, which I didn't get posted because I got distracted. But this workflow has to be integrated into the Maintenance Group process, so we should talk about this. Also, it needs to be designed for all TDWG vocabularies and not just DwC.

ben-norton commented 11 months ago

I agree with every word in the response from @baskaufs My goal is to complete the generalized version of the doc generator by the end of 2023. I've blocked out several days in Dec to do so.

https://github.com/ben-norton/stadocgen

It uses the exact same csv files Steve uses to generate the rdf.

https://tdwg.github.io/ltc/ The pages are generated using 4 CSV files (including skos mappings) and 4 markdown files (for the content headers). Drop them in a folder > run a single command > done. The CSV files are exactly what is sent to Steve. I know Camtrap pages are different, but that's not a difficult change to make using stadocgen. Latimer Core is very big. I was able to knock out most use cases during development by using it as a test case

peterdesmet commented 11 months ago

Hi @ben-norton that sounds great!

Can stadocgen handle translations? I suggest you get with @MattBlissett so you know how to streamline translations to and from CrowdIn.
While at the GBIF offices, I setup this repository to test building the pages and supporting translations using Jekyll (i.e. what is currently used by Darwin Core and Camtrap DP). Since you're working on stadocgen, I'll stop this for now. If you are interested in what I was able to support, see https://github.com/gbif/dwc-jekyll/issues/2. Feedback on what stadocgen could provide and where Jekyll can pick in would be useful.

ben-norton commented 11 months ago

@peterdesmet There's an intersection of our two efforts. I'm python has translation capabilities. Jekyll is Ruby and Flask is python, but there's a solution. Stadocgen also does several transformations of the csv files. How are translations being implemented (how do you switch languages on the frontend)?

MattBlissett commented 10 months ago

I'm prompted by JBIF (Japan) who are now looking at translating DWC + vocabularies in the Crowdin project.

The process I've set up takes CSV files from rs.tdwg.org, converts them into a format more friendly to Crowdin, then reverses the conversion to generate another CSV file (example English, translations).

Other bits of the website will need translation, words not in the CSV like "Term name" and the introduction at the top of the page, the menu, footer and so on. It would be best to use a standard Python translation tool to support these, with the dictionary in a widely supported format. GNU Gettext PO is a good option, or JSON. These can then be added directly to Crowdin.

Peter's frontend example (English, Dutch) is described in #2.

gbif / dwc-jekyll

How to write type #1