Open meganrwong opened 2 years ago
We have pdf version of 1992. Need to extract defs to get to html fragments. Time required to do. Need approach for extracting text from PDF
required for the following methods - 5A2 6B2 7A2 7B1 7C1a 7C2 9C2 14E2 14F4 14F5 14G2
@dr-shorthair @amacleod-cerdi had a crack at this. It is error prone and we'l have to create .jpgs for all the images and possible formulas. There may be better solutions out there for extracting to html and .jpg from text in pdf . But, I suspect in our case this lots of work that will return minimal value to users of this vocab. So I suggest we close this issue.
there are methods included in the current machine readable from version 1 of the text. URN:ISBN:978-0-909605-68-1
Included were the methods that were not included in version 2 of the text (ie chapter 1 and its procedures), and those that were split out into separate procedures in chapter 2. For example 05A2 (version 1 text) is replaced in version 2 by https://raw.githack.com/ANZSoilData/def-au-scma/master/html/05-soluble-Cl/05A2a.html and https://raw.githack.com/ANZSoilData/def-au-scma/master/html/05-soluble-Cl/05A2b.html
As far as I know there is no eVersion of version 1, will look into