Open ba001 opened 7 years ago
no rush on this--just whenever you get a chance, can you look into this? i realized it's going to be quite a job for me to downgrade back to El Capitan
There is no way for me to debug an issue that I can't replicate.
Of course not. The question now is: why can't you replicate it? Because you aren't using Sierra? Or because you are, and the importing works just fine?
My development environment is Linux.
FWIW - I upgraded to Sierra today to check this out. And I can confirm the slow import. Here are some steps I tried, but to no avail.
Info files: 100%|██████████████████████████████| 71/71 [00:00<00:00, 607.75it/s]
BAD files: 0%| | 0/1 [00:00<?, ?it/s]Trying ../../replice/works/america.a.xml
BAD files: 100%|██████████████████████████████████| 1/1 [01:01<00:00, 61.85s/it]
Traceback (most recent call last):
File "import.py", line 451, in <module>
main()
File "import.py", line 447, in main
importer.import_data()
File "import.py", line 67, in import_data
self.process_relationships()
File "import.py", line 111, in process_relationships
self.process_relationship(entry)
File "import.py", line 116, in process_relationship
obj.objects_from_same_matrix.extend(self.objects_for_id_string(entry.same_matrix_ids))
AttributeError: 'NoneType' object has no attribute 'objects_from_same_matrix'
The Trying ../../replice/works/america.a.xml
, is just something I added to see if it was a particular BAD causing the hang-up. It's not.
You can see it took < seconds to "process" parse all of the info files. But it took 60s to "process" the single info file. Not sure how much this info helps...but that is about all I can offer.
I'll be updating the deployment scripts to backup the database next week. That way, if the import fails, we can revert the database. Once I have those scripts written, I can write some documentation on how to "sync" dev with your local machine. It's not a pretty way to develop, but it will have to suffice for now.
The first thing I suggest you guys do is update your libxml and python LXML libraries to the most current available versions, as that might solve the problem. Once that is done, I'll look into creating a script you can use to profile the execution of the import to see what is going on.
Updated python LXML libraries and libxml. No luck
I've updated the import script to have a --profile option. Run import.py with this option added, but as you otherwise normally would. Let it run for a while (the longer the better, but at least an hour or two) and then use command-c to terminate the import process. A file called import_stats.out should be created, post it here and I'll see what I can do about the source of the slowdown.
getting an error:
(blake) english00024:blakearchive michaelfox$ python import.py ../../data --profile
Traceback (most recent call last):
File "import.py", line 455, in
something i should change?
This should be operating properly now.
attached:
I just pushed a version with an optimization, see how that works. If it is still too slow, post another profile log using that version.
Little better but still very slow. I stopped it after a couple hours (I think) at 60 some percent. Here are the new stats:
Your profiler is giving me very incomplete information for some reason. I've updated the import script to try and work around the issue, please run the latest code and post another log.
Alright, the problem is the XSLT transform is running very slowly on your system. I'm not sure why this is the case, but in any event it isn't fixable from our end.
would it help if i gave you a login to my Mac (Sierra) at my office so that you can try to debug from it? i can set up the dev environment for you under that login.
Not really. I don't want to descend into debugging a large and unfamiliar external library, that's going to eat up more time than the workarounds, particularly given my lack of familiarity with the nuances of macos internals.
My personal suggestion would be to get virtualbox and run a linux VM.
ok, i can try that. in the meantime, are there other xml libraries we could try besides lxml?
None that do XSLT.
As a side note, apparently the XSLT facilities of LXML aren't part of libxml2, but libxslt. Perhaps you could try upgrading that library?
looks like i've got the latest version, 1.1.29. maybe this is something for stackexchange or an lxml (libxslt) bug report
i just updated my work mac to Sierra, and now i'm having a problem importing the BADs. the process goes up by 3 or 4 percentage points and then stalls. i verified that the problem cropped up after updating to Sierra by replicating the problem on my home mac, which hadn't yet been updated and was importing just fine.