MarcusBarnes / islandora_compound_batch

Provides the basic ability to batch import compound objects into Islandora.
GNU General Public License v3.0
3 stars 12 forks source link

Something I am doing is wrong #17

Open whikloj opened 7 years ago

whikloj commented 7 years ago

@MarcusBarnes I was trying this out and I'm not sure what I am doing wrong, but I ended up with 200 objects instead a compound with 200 children.

I have run it twice, using the default --parent_relationship_pred and then setting it to --parent_relationship_pred=isConstituentOf. Same result after both ingests.

I have a directory structure like

/vagrant/test_compounds
     /compound_1
           /1
               OBJ.jpg
           /2
               OBJ.jpg
          ....
          /200
               OBJ.jpg
          MODS.xml
          structure.xml

I created the structure.xml by running php create_structure_files.php /vagrant/test_compounds/

It looks like this

<?xml version="1.0" encoding="utf-8"?>
<!--Islandora compound structure file used by the Compound Batch module. On batch ingest,
    'islandora_compound_object' elements become compound objects, and 'child' elements become their
    children. Files in directories named in child elements' 'content' attribute will be added as their
    datastreams. If 'islandora_compound_object' elements do not contain a MODS.xml file, the value of
    the 'title' attribute will be used as the parent's title/label.-->
<islandora_compound_object title="compound_1">
  <parent title="compound_1/1"/>
  <parent title="compound_1/10"/>
  <parent title="compound_1/100"/>
  <parent title="compound_1/101"/>
  <parent title="compound_1/102"/>
  <parent title="compound_1/103"/>
 ....

Then I ran

drush -u 1 islandora_compound_batch_preprocess --namespace=islandora --parent='islandora:compound_collection' --target=/vagrant/test_compounds

and drush -u 1 ibi --ingest_set=<set id>

Then I tried

drush -u 1 islandora_compound_batch_preprocess --namespace=islandora --parent='islandora:compound_collection' --parent_relationship_pred=isConstituentOf --target=/vagrant/test_compounds

Same result, when I go into islandora:compound_collection there are two objects named MODS and 400 named OBJ. They are all compound objects.

This is obviously not the expected behaviour, what did I mess up?

MarcusBarnes commented 7 years ago

@whikloj If you are able to try with only one batch set ready for ingest, would you please try running first (I've updated to admin to 1):

drush --user=1 islandora_compound_batch_prune_relationships

This will clear the islandora_compound_batch database table of previously ingested object data. Then try running

drush --user=1 islandora_compound_batch_preprocess --namespace=islandora --parent=islandora:compound_collection --target=/vagrant/test_compounds

Then

drush -v --user=1 islandora_batch_ingest

If the above doesn't work, would you try with --user=admin to see what happens. Failing any insight from the above, would it be possible to send me a sample that I can run locally to see if I'm able to see anything that might be causing the issue? We can talk about getting the sample via Skype.

Thank you for trying the module out.

mjordan commented 7 years ago

@whikloj does each of the source compound objects under /vagrant/test_compounds have its own structure.xml?

whikloj commented 7 years ago

@mjordan yes, because currently I am only creating one compound I have a single directory under test_compounds called compound_1 and that has 200 directories under it each with a OBJ.jpg.

whikloj commented 7 years ago

So I traced it out. I was using this to generate a test compound so I only stuck a MODS.xml in the top level directory and OBJ.jpg in all the other. Which meant that all the sub-directories or "children" had 1 file.

That is why I had no child elements in my structure.xml file. https://github.com/MarcusBarnes/islandora_compound_batch/blob/master/extras/scripts/tree_to_compound_object.xsl#L32

I changed that to count(file) > 0 and the structure.xml became

<?xml version="1.0" encoding="utf-8"?>
<!--Islandora compound structure file used by the Compound Batch module. On batch ingest,
    'islandora_compound_object' elements become compound objects, and 'child' elements become their
    children. Files in directories named in child elements' 'content' attribute will be added as their
    datastreams. If 'islandora_compound_object' elements do not contain a MODS.xml file, the value of
    the 'title' attribute will be used as the parent's title/label.-->
<islandora_compound_object title="compound_1">
  <child content="compound_1/1"/>
  <child content="compound_1/10"/>
  <child content="compound_1/100"/>
  <child content="compound_1/101"/>
  <child content="compound_1/102"/>
  <child content="compound_1/103"/>
 ....

I am ingesting this new batch and will report back.

I'm guessing the count(file) > 1 was because top-level directories have a MODS.xml and lower levels have more than that....generally. I'll leave that one to you guys.

whikloj commented 7 years ago

And that was the problem. 👏

So I'm not sure if my use case was odd or if there is something you can do. I didn't get too far into the directory traversal you were doing.

If you want you can fix this and point to the fix or you can just close this as I get what to look for in the structure.xml before I get too far.

MarcusBarnes commented 7 years ago

@whikloj I'm glad the issue was narrowed down. I'll have to think more about this use case and how it would fit it, and investigate if the potential fix would introduce other issues. I'm going to leave this open for now until I've had a little more time to think about it.

MarcusBarnes commented 5 months ago

SPAM comment/use reported to Github. I've deleted the comment here.