Closed IanMayo closed 1 year ago
Yes, a few things will still be generated in Run 1 because it was easier to generate them than not - this was the old code that dealt with the category pages and so on, and it was easier to just leave the old code alone. The files that are being produced are those that are produced from region pages, category pages etc.
I can stop those being produced during run 1, but it'll probably be a little bit fiddly and I don't know how much time it will actually save. I'll give it a try - it could be things like copying all the images across is taking a while for the real data (if there are lots of large images) and we can try and be a bit more clever about when/where we copy images.
I'll have a look at this, but it definitely won't be today. Hopefully tomorrow or at the weekend.
Run 1 becomes optional now that we have persistent shopping list #557
Run 1 is used to collate a list of link targets (plus
first div on page
) that we then use to parse the content.But, Run is also generating DITA content, which makes me suspect we're generating it twice.
Here is my experiment:
target
python parser/parse_single_file.py data data/Spain.Legacy/Phase_G.html
target/dita/regions
- and see that there are quite a few folders of data present.Ok. I inserted an
exit()
inparse_single_file
after run 1 was complete, and I still get generated content intarget/dita
.My compile/debug cycle is currently around 7 minutes - so I'm keen to trim away what I can :-)