Closed loilisxka closed 3 months ago
Please post your script, console output, and your expected behaviour for the code?
OK. My script file is in the attachment. The target text being processed is the abstract section of a paper. This script is able to capture some chemical entities, but some are missed. We want it to be available in full, like low-density polyethylene and so on.
------------------ 原始邮件 ------------------ 发件人: "CambridgeMolecularEngineering/chemdataextractor2" @.>; 发送时间: 2024年5月15日(星期三) 晚上6:44 @.>; @.**@.>; 主题: Re: [CambridgeMolecularEngineering/chemdataextractor2] abstract.records[i].serialize() (Issue #56)
Please post your script, console output, and your expected behaviour for the code?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Do you mind post these on github.com, as the attached files cannot go through by replying the GitHub notification email?
You can you code snippet like this
from chemdataextracor.doc import Document
Or you can directly attach your file.
Okay, excuse me. I'll re-upload my code ` from chemdataextractor import Document from chemdataextractor.parse import R, I, W, Optional, merge from chemdataextractor.reader import ElsevierXmlReader, HtmlReader, PlainTextReader from chemdataextractor.model.model import Compound from chemdataextractor.doc import Paragraph
reader = PlainTextReader()
abstract = Document("Molded seal devices made of crystalline polymers are widely used in high-pressure hydrogen equipment. \ A method for evaluating high-pressure hydrogen permeability was recently reported; however, the evaluation \ cost is extremely high. To select suitable crystalline polymers for molded hydrogen seals or barrier devices, \ a high-pressure hydrogen permeability prediction method using the polymer structure and its conven- tional \ properties is required. In this study, we measured the pressure dependency of the hydrogen permeability of \ lowd-density polyethylene (LDPE), high-density polyethylene (HDPE), and polyamide11 (PA11). We constructed \ the permeation model for crystalline polymers in terms of the tortuosity induced by their higher-order \ structures and free volume change in the amor- phous region evaluated using PVT method for measuring the \ relationship between pres- sure (p), specific volume (v) and temperature (T) in the molten-solid state of a \ polymer. The results of the pressure dependency of hydrogen permeability were reproduced by the developed \ permeation model.")
abstract.models = [Compound]
print(abstract.cems[0]) print("The keywords of Abstract:") print(abstract.records) for i in range(len(abstract.records)): print(abstract.records[i].serialize()) ` I would like to know how to modify the code or source code to identify all chemical entities. If you want to modify the source code, which part should be modified?
In short, you can modify the parsing phrases in chemdataextractor.parse.cem_factory
to include adjectives like 'low density' in your text.
Thanks, I still want to know more details about cem_factory. What parameters are used to add adjectives, and how does it work? Which link's results will be passed to cem_facotry for processing?
A parser object has a root
function to generate a set of parsing rules, where the elements are coming from cem_factory
E.g., line 199 in chemdataextractor.parse.cem
@property
def root(self):
You can add new parsing rules to the root phrase. For instance, something like W('low') + W('density') + original_rule
and added to the returned expression of root
.
Thank you very much, this helps me a lot. The code successfully captured the chemical entity.
Hi, I'm trying to use cde to extract compound names from the literature. However, when I use compound to extract, the program cannot extract words such as "low-density polyethylene". Other situations are normal. I want to cover these nouns by modifying the source code. Please tell me where should I modify the code? Looking forward to your reply.