-
While frequency per 10k is a much better measure than raw counts, it's still not completely independent of corpus size as Harald Baayen has argued extensively.
I need to reread his book _Word Frequ…
-
`data/corpus/softcite_corpus.tei.xml` contains the following articles that have no software annotations in them:
[`article_with_no_mention_in_softcite_corpus_tei_xml.csv`](https://raw.githubusercon…
-
Hi, me again :)
More than mapping ids/names to the NCBI Taxonomy, my use case is to match those with the NCBITaxon ontology : http://www.obofoundry.org/ontology/ncbitaxon.html (which is built autom…
-
DataFrame cannot be serialized to JSON. Assume gets tripped by text tokens / insufficient escaping
-
### Version
1
### DataCap Applicant
aitrain
### Project ID
3
### Data Owner Name
Ai trainer
### Data Owner Country/Region
China
### Data Owner Industry
IT & Technology Services
### Website…
-
Post questions here for this week's oritenting readings:
Kozlowski, Austin, Matt Taddy, James Evans. 2019. [“The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.”](http…
lkcao updated
8 months ago
-
The accents come out wrong in this word as shown in the following link.
http://www.perseus.tufts.edu/hopper/text?doc=Xen.+Ways+4.51&fromdoc=Perseus%3Atext%3A1999.01.0209
The code has `εἴρηκα ξύμ…
-
-
Post questions here for this week's exemplary readings:
4. Stuhler, Oscar. 2021. [“What’s in a category? A new approach to Discourse Role Analysis.”](https://doi.org/10.1016/j.poetic.2021.101568) P…
lkcao updated
8 months ago
-
Post your response to our challenge questions.
First, write down two intuitions you have about broad content patterns you will discover in your data. These can be the same as those from last week..…