Refactor python - Githubissues

IanMayo commented 1 year ago

The parser.py file is really long.

I suggest this refactoring, which shouldn't take long at all:

for each of the largest functions in this file (i.e. more than a screen-ful), move the function to another file
process_class_file is particularly long. I suggest this strategy: the target document is in 4 sections. Represent this as 4 functions in the file. The function will receive the whole soup object, the target dita_body object, and the full_soup destination object. The function will try to find it's tag of interest, if it's there then process it, and append the results object to the dita_body.

AbegaM commented 1 year ago

Hello, @IanMayo. Is this the next issue we are starting, right?

IanMayo commented 1 year ago

yes please :-)

AbegaM commented 1 year ago

Hi @IanMayo, can you please tell me what you mean by the full_soup object, in the comment above, please?

IanMayo commented 1 year ago

Sure. When we create a new soup object in BS4, it is a full document, with a schema, and we can create child elements using it. This is the full_soup. Well, we create two types of BS4 document. An html which we parse, and an XML (dita) one that we are writing to.

When we extract elements from the full_soup, and pass them into our functions, we've been calling those elements soup - since we manipulate them just like the original document. But, we can't create child elements using it. To create child elements, we need to pass the full_soup object as a parameter.

So, we either have to introduce soup and full_soup, or we switch our existing soup term to tag, and let the original BS4 objects we create be called html_soup and dita_soup. Hmm, let's switch to tag for the matching object that we pass into our process functions. So, they will receive tag, target and dita_soup - where we use dita_soup for new_tag() calls.

DeepBlueCLtd / LegacyMan

Refactor python #308