avanscholar / Ex_scibert

Text mining
1 stars 1 forks source link

question about 'catalys_entit_final.csv' file in Final_anotation_for_classification.ipynb #1

Open nkuhuxu opened 1 year ago

nkuhuxu commented 1 year ago

Dear Dr. Avan Kumar, I have the honor to send this email to you. My name is Xu Hu, currently doing my master's degree at Nankai University, majoring in computational electrocatalysis. During my literature reading, I read the paper you published titled “A text mining framework for screening catalysts and critical process parameters from scientific literature - A study on Hydrogen production from alcohol". The application of the Nature Language Process is very interesting, and the detailed codes here are useful to my current research. However, regarding the code about labeling for classification: Catalyst sentences in Final_anotation_for_classification.ipynb, the generation of 'df_cat.csv' is based on 'catalys_entit_final.csv', but on the contrary, the generation of 'catalys_entit_final.csv' is based on 'df_cat.csv' in the catalyst_count_process.ipynb. I searched all the codes and could not find the way to generate "catalys_entit_final.csv". It would help me a lot if you could help me with this question. Sincerely, Xu Hu

avanscholar commented 1 year ago

(1). We have extracted all possible chemical entities (as catalyst entities) from the text with the help of the ChemDataExtractor library. (2). Frequency of each unique catalyst stored in CSV format file (as catalys_entit_final.csv ).

(3). Then, manually, delete all non-catalyst entities from it. And then re utilised the same file (catalys_entit_final.csv) for further screening.

nkuhuxu commented 1 year ago

Thanks for your kind reply, it definitely helps a lot!