JuliaText / CorpusLoaders.jl

A variety of loaders for various NLP corpora.
Other
32 stars 13 forks source link

Support for CoNLL Corpora #20

Closed Ayushk4 closed 5 years ago

Ayushk4 commented 5 years ago

I am adding support for various CoNLL corpora in a generic sense and CoNLL 2003 in particular. ~I am assuming the deprecation fixes from #18.~

Can you review the data website used in the DataDeps registration block currently being used?

Refer #10.

oxinabox commented 5 years ago

looks solid to me

Ayushk4 commented 5 years ago

The CoNLL datasets have a separate test and train files. Should I modify the struct to hold the path of these files and then allow the user to call train or test files? Somewhat like this-

struct CoNLL{S}
    filepaths::Vector{S}
    year::Int
    trainpath::String
    testpath::String
    devpath::String
end

and it could be accessed something like following - load(CoNLL(), "train") which returns the files from train.txt.

oxinabox commented 5 years ago

Yes, that sounds good.