Open Zuckonit opened 4 years ago
Sure. Here are the steps, assuming that the CBETA XML files are in a directory called xml_dir, that you want 1-6-grams, and that the catalogue is called catalogue.txt:
Create the corpus from the XML:
tacl prepare source_dir xml_dir
tacl strip xml_dir corpus_dir
Create the database:
tacl ngrams cbeta.db corpus_dir 1 6
Run the diff:
tacl diff cbeta.db corpus_dir catalogue.txt > diff-results.csv
Does this help?
how about corpus_dir? what does it contains, and how can I make one
corpus_dir
is created by tacl strip
- it takes the files in xml_dir
(itself created as the output of tacl prepare
) and outputs the stripped versions of them in whatever you specify as corpus_dir
.
In my example, xml_dir
, corpus_dir
, catalogue.txt
, cbeta.db
, and diff-results.csv
are all paths that you specify. Only in the case of catalogue.txt
do you need to have any content there before running those commands in that sequence.
Sure. Here are the steps, assuming that the CBETA XML files are in a directory called xml_dir, that you want 1-6-grams, and that the catalogue is called catalogue.txt:
- Create the corpus from the XML:
tacl prepare source_dir xml_dir
tacl strip xml_dir corpus_dir
- Create the database:
tacl ngrams cbeta.db corpus_dir 1 6
- Run the diff:
tacl diff cbeta.db corpus_dir catalogue.txt > diff-results.csv
Does this help?
It is helpful! Could you please write how to manipulate results (by tacl results/align/highlight ) as this case?Because I had trouble in them, like the attached image, even though pandas, biopython, etc are all installed. Thanks a lot for writing and sharing this software!
So in that case, as per the last line of the error text, there is no results file diff-result.csv in that directory, so it is unable to manipulate those results. Presumably the results are either in a file with a different name, or in a different directory, or both.
I read the doc, but still meet some problem. I have 20 cbeta xml(20 diff label, assume 1 to 20), and I wanna make a diff result of them. could you please to provide a 'step-by-step' tutor of this.