collab-uniba / Senti4SD

An emotion-polarity classifier specifically trained on developers' communication channels
http://collab.di.uniba.it/research
MIT License
49 stars 18 forks source link

input file and output file row count doesn't match #3

Closed nasifimtiazohi closed 6 years ago

nasifimtiazohi commented 6 years ago

testinput.xlsx testoutput.xlsx

the csv formats of these files were my input. where the input files have 1826 rows, the output file has 1829 rows-- and I have no way to say which is which. I just followd the procedure explained in the documentation. Can you please tell me what is the problem?

Furthermore, I don't think it's a good design that the output file doesn't generate labels along with the associated comments/other infos. The t0,t1 won't help me in anything. I am not sure what they mean.

Can you guys address this problem a bit quickly. I was trying to use this impressive tool in my research and I need to run it on over 1 million of texts. I am short in time. If this problem persists, I cannot proceed. @fedemaiorano

fedemaiorano commented 6 years ago

Hi @nasifimtiazohi, i launched classificationTask.sh over a csv format of testinput.xlsx (testinput.csv.zip in the zip you can find the csv file i used; i saved the csv quoting all the text cells). And with this csv the output file of classificationTask.sh has 1827 rows (the first row is the header)

The tool works sequentially over an input text, so t0 is the first text of the input file, t1 is the second etc.

nasifimtiazohi commented 6 years ago

hi @fedemaiorano , thnks for the quick response.

Can you tell me how you quote delimitted the csv files? Also I need to batch process a lot of .xlsx files into such csv files (comma and quote delimmited, right?). Do you have any quick suggestion on how to do that?

fedemaiorano commented 6 years ago

I quoted the cells directly from my spreadsheet, when saving the .xlsx in csv format. I never needed to convert a lot of .xlsx files into csv, so i don't kwow how to do that. Maybe you can write your own script.

nasifimtiazohi commented 6 years ago

problem was solved by quote delimmitting the texts in the csv file