Open caseyfitz opened 3 years ago
More specifically, inside the container we find many files that aren't in this repository
└── ubuntu
├── application.py
├── config.py
├── data
│ ├── CombinedDictionaryMap.json
│ ├── CombinedNGRAMMatrixCSR.pkl
│ ├── FOSIndex.json
│ ├── FOSMAP.json
│ ├── OSDG-Ontology.json
│ ├── SdgThresholds.json
│ ├── Spacy_bigram_th1.md
│ ├── spacy_idf_th1.json
│ └── spacy_trigram_th1.md
├── Dockerfile
├── exceptions.py
├── get_data.py
├── index_html
├── LICENSE
├── __pycache__
│ ├── config.cpython-37.pyc
│ ├── exceptions.cpython-37.pyc
│ ├── sdgFinder.cpython-37.pyc
│ └── utils.cpython-37.pyc
├── README.md
├── requirements.txt
├── sampleAPICall.py
├── sdgFinder.py
├── setup.sh
└── utils.py
Are these maintained in a public repository?
@caseyfitz
Thanks for your question!
The answer is - not yet, but we will put these in the public repo by the end of the month. So it should be online from 1st February 2021. However, we will move the repository to a new address (https://github.com/osdg-ai/osdg-tool) and the full source code will be posted there.
We are currently cleaning and refactoring the code so it would be more readable and user-friendly
@lukas-pkl, looking forward to it––thanks!
@lukas-pkl a quick related question (then I'll make sure to close).
I'm wondering how to interpret the "quota_9"
field in the file SdgThreasholds.json
, of form
{
"SDG_1":
{"LowerTh": 2, "UpperTh": 4, "quota_9": 6},
"SDG_2":
{"LowerTh": 2, "UpperTh": 6, "quota_9": 20},
"SDG_3":
....
which is used in sdgFinder.py
to divide the relevance scores for each sdg
sdg_res_raw_fosNames[key] = plh3
# Applying .9 quota
self.sdg_res = sorted(sdg_res_raw_n.items(), key=lambda kv: kv[1] / self.sdgThresholds[kv[0]]['quota_9'], reverse=True)
self.sdg_res_det = {}
I couldn't find this term referenced in the main repo or the arxiv paper.
Thanks!
@caseyfitz - we are addressing issues like this in our current refactoring.
Basically, quota_9
is a parameter we use to sort the SDGs before producing the output.
One of the issues we faced with was that the API sometimes produces too many SDG labels even with thresholds applied.
As such, we have decided to limit the API output to three SDG labels. We select top three labels using quota_9
parameter, which we set by assigning SDG tags to a pool of publications and analyzing the distribution of SDG-FOS'es.
The parameter corresponds to 90% percentile of the distribution for each SDG, which means that we rank publication SDGs by the how close they come to this mark.
We are preparing an update to the arxiv paper, which we will present in a conference in July. We will update the arxiv version after the event.
Let me know if anything else comes up!
Thank you!
I'm interested in the source for the tool itself, e.g., the the Dockerfile and the scripts run by the container.