TeangaNLP / teanga2

Teanga a dó
Apache License 2.0
0 stars 0 forks source link

Better indexing of docs #42

Open jmccrae opened 6 days ago

jmccrae commented 6 days ago

Some API changes to make it easier to access docs.

Currently

# Iterate documents
for (docid, doc) in corpus.docs:
  print(docid)

# Get document by ID
doc = corpus.doc_by_id('abcd')

# Get nth document
doc = corpus.doc_by_id(corpus.doc_ids[10])

Change to

# Iterate documents
for doc in corpus.docs:
  print(doc.id)

# Get document by ID
doc = corpus['abcd']

# Get nth document
doc = corpus[10]
jmccrae commented 4 days ago

Also support ranges

corpus_subset = corpus[:10]