comses / catalog

Web tools to annotate publications related to computational modeling
http://catalog.comses.net
GNU General Public License v3.0
3 stars 3 forks source link

Types of URLs #108

Open MarcoAJanssen opened 7 years ago

MarcoAJanssen commented 7 years ago

For the analysis, and visualization we want to keep track of different type of URLs. For example in Table 1 of Janssen (2017) we distinguish the following types of URLs: Journal; Personal; OpenABM; SourceForge; GitHub; Netlogo; Cormas; CCPForge; BitBucket; Dataverse; Dropbox; GoogleCode; ResearchGate; and Invalid

In Figure 3 we aggregated a number of them into OpenSource and Platform. Maybe the current scripts can generate those catagories. Otherwise we may include those categories in the form.

cpritcha commented 5 years ago

The type field on URLStatusLog tracks this.

MarcoAJanssen commented 5 years ago

You will have to explain this. I wonder how this can be automatized. I do not think it is possible with 100% correctness and therefore I suggest that a person who add metadata can select from the catagories of URLs.

cpritcha commented 5 years ago

The verify_urls task that is supposed to run monthly tries each of the code archive urls and categorizes them. Right now the categories are CoMSES, open source (urls from sourceforge, ccpforge, bitbucket, dataverse.harvard.edu, code.google.com, figshare), platform (urls from modelingcommons.org, ccl.northwestern.edu/netlogo/models/community, cormas.cirad.fr), journal (urls from journals.plos.org), personal (urls from dropbox, researchgate or ending in zip, pdf, txt or docx) and other urls. #148 tracks splitting the url into url_category and whether or not it is valid)

cpritcha commented 5 years ago

Since the URLStatusLog also keeps track of the original url used to make the request it will be possible to recategorize the urls later if we don't like the categorization (or want more than one)

MarcoAJanssen commented 5 years ago

For one of the publications 73474 (From individuals to population cycles: the role of extrinsic and intrinsic factors in rodent populations) the classification in "Unknown" but it should be "Journal". I tried to change this but I keep getting error messages.

MarcoAJanssen commented 5 years ago

We should add a few new categories: DataDryad (archive) University archives (archive) R (Platform) Amazon Cloudfront (Personal or Organizational)