Closed IanMayo closed 1 year ago
Hello, @IanMayo. Is this the next issue we are starting, right?
yes please :-)
Hi @IanMayo, can you please tell me what you mean by the full_soup
object, in the comment above, please?
Sure. When we create a new soup
object in BS4, it is a full document, with a schema, and we can create child elements using it. This is the full_soup
. Well, we create two types of BS4 document. An html which we parse, and an XML (dita) one that we are writing to.
When we extract elements from the full_soup
, and pass them into our functions, we've been calling those elements soup
- since we manipulate them just like the original document. But, we can't create child elements using it. To create child elements, we need to pass the full_soup
object as a parameter.
So, we either have to introduce soup
and full_soup
, or we switch our existing soup
term to tag
, and let the original BS4 objects we create be called html_soup
and dita_soup
. Hmm, let's switch to tag
for the matching object that we pass into our process
functions. So, they will receive tag
, target
and dita_soup
- where we use dita_soup
for new_tag()
calls.
The parser.py file is really long.
I suggest this refactoring, which shouldn't take long at all:
process_class_file
is particularly long. I suggest this strategy: the target document is in 4 sections. Represent this as 4 functions in the file. The function will receive the wholesoup
object, the targetdita_body
object, and thefull_soup
destination object. The function will try to find it's tag of interest, if it's there then process it, and append the results object to thedita_body
.