Open peterdesmet opened 11 months ago
@peterdesmet I think it should be presented as "Concept Scheme". The reason it is not is due to it not being considered as one of the possibilities in the code that generates the page. That is easily fixed and the page can be regenerated.
Before you embark on this effort, you, me, @ben-norton, and @tucotuco need to get together and agree on a plan. Currently, all of the list of terms pages in DwC and AC are built using stand-alone Python scripts (not Jupyter notebooks) that I wrote. All of the Quick Reference Guides (DwC, Chrono, and soon to be Humboldt) are built by other Python scripts that were not written by me, but are maintained by John and strung together by another Python script I wrote. Ben has created a system for generating the Latimer Core pages using Flask.
I recognize that the current Python scripts are not great and they should be replaced by something more generic written by someone who knows what they are doing. But Ben should be brought into this because he is advocating using his Python/Flask scripts that he has already written for everything. I would like to be involved in this discussion because there is a sequence of events that needs to be followed to make the human-readable pages mesh with the machine-readable data. I made a flow chart about this, which I didn't get posted because I got distracted. But this workflow has to be integrated into the Maintenance Group process, so we should talk about this. Also, it needs to be designed for all TDWG vocabularies and not just DwC.
I agree with every word in the response from @baskaufs My goal is to complete the generalized version of the doc generator by the end of 2023. I've blocked out several days in Dec to do so.
https://github.com/ben-norton/stadocgen
It uses the exact same csv files Steve uses to generate the rdf.
https://tdwg.github.io/ltc/ The pages are generated using 4 CSV files (including skos mappings) and 4 markdown files (for the content headers). Drop them in a folder > run a single command > done. The CSV files are exactly what is sent to Steve. I know Camtrap pages are different, but that's not a difficult change to make using stadocgen. Latimer Core is very big. I was able to knock out most use cases during development by using it as a test case
Hi @ben-norton that sounds great!
@peterdesmet There's an intersection of our two efforts. I'm python has translation capabilities. Jekyll is Ruby and Flask is python, but there's a solution. Stadocgen also does several transformations of the csv files. How are translations being implemented (how do you switch languages on the frontend)?
I'm prompted by JBIF (Japan) who are now looking at translating DWC + vocabularies in the Crowdin project.
The process I've set up takes CSV files from rs.tdwg.org, converts them into a format more friendly to Crowdin, then reverses the conversion to generate another CSV file (example English, translations).
Other bits of the website will need translation, words not in the CSV like "Term name" and the introduction at the top of the page, the menu, footer and so on. It would be best to use a standard Python translation tool to support these, with the dictionary in a widely supported format. GNU Gettext PO is a good option, or JSON. These can then be added directly to Crowdin.
Peter's frontend example (English, Dutch) is described in #2.
@baskaufs Some context, I created this test repository with @MattBlissett to test how translations could be implemented for the Darwin Core website (without breaking anything there while we test). I also want to build vocabulary pages (such as
em
) using Jekyll templates, so we don't have to do that with the Jupyter notebooks.Question about https://dwc.tdwg.org/em/#4-vocabulary: I notice terms have a
Type
http://www.w3.org/2004/02/skos/core#ConceptScheme
Concept
I think it would be best to use one approach for all. What do you prefer (full URL or shortened)?