Schwittleymani / ECO

Electronic Chaos Oracle
https://schwittlick.net/eco
Apache License 2.0
6 stars 1 forks source link

implement author metadata in w2v model #190

Closed schwittlick closed 7 years ago

schwittlick commented 7 years ago

in order to have author metadata:

1. in parser, create a csv for each parsed file, with sentence and filename in each line
2. concat everything in one file ala parsed_v3_valid_with_authors.txt
3. in https://github.com/Schwittleymani/ECO/blob/master/src/python/modelbuilder/doc2vec_builder.py#L13 split the line and use the author as a second label (tags accepts a list of labels)
4. implement get_author_tag(sentence) (which is sentence.tags[1], sentence.tags[0] is the unique id)
schwittlick commented 7 years ago

parsing everything new, exporting to csv files: /home/marcel/drive/data/eco/NAIL_DATAFIELD_txt/parsed_v4

schwittlick commented 7 years ago

trained new doc2vec model:

/mnt/drive1/data/eco/NAIL_DATAFIELD_txt/parsed_v4/parsed_v4_valid.doc2vec
/mnt/drive1/data/eco/NAIL_DATAFIELD_txt/parsed_v4/parsed_v4_valid.doc2vec.docvecs.doctag_syn0.npy
schwittlick commented 7 years ago

trained again: /mnt/drive1/data/eco/NAIL_DATAFIELD_txt/parsed_v4/parsed_v4_valid_tagged.doc2vec