Writing DITA in Run 1 - Githubissues

IanMayo commented 11 months ago

Run 1 is used to collate a list of link targets (plus first div on page) that we then use to parse the content.

But, Run is also generating DITA content, which makes me suspect we're generating it twice.

Here is my experiment:

delete target
run python parser/parse_single_file.py data data/Spain.Legacy/Phase_G.html
examine target/dita/regions - and see that there are quite a few folders of data present.

Ok. I inserted an exit() in parse_single_file after run 1 was complete, and I still get generated content in target/dita.

My compile/debug cycle is currently around 7 minutes - so I'm keen to trim away what I can :-)

robintw commented 11 months ago

Yes, a few things will still be generated in Run 1 because it was easier to generate them than not - this was the old code that dealt with the category pages and so on, and it was easier to just leave the old code alone. The files that are being produced are those that are produced from region pages, category pages etc.

I can stop those being produced during run 1, but it'll probably be a little bit fiddly and I don't know how much time it will actually save. I'll give it a try - it could be things like copying all the images across is taking a while for the real data (if there are lots of large images) and we can try and be a bit more clever about when/where we copy images.

I'll have a look at this, but it definitely won't be today. Hopefully tomorrow or at the weekend.

IanMayo commented 11 months ago

Run 1 becomes optional now that we have persistent shopping list #557

DeepBlueCLtd / LegacyMan

Writing DITA in Run 1 #522