Open dsotirho-ucsc opened 2 months ago
Each bundle contributes different files. The project inner entity in those contributions are inconsistent but there is only one contribution to each file. Azul only considers contributions to any given outer entity (file, in this case), there are no inconsistencies among the contributions to that file, since there is only one such contribution.
Each bundle also contributes to the project outer entity, and there are inconsistencies among the latest contributions to that entity, so the indexer should detect that and fail. It seems that this is a bug in reconcile_inner_entities and we should test that assumption by applying the patch and stepping through that method. It's also possible that that method isn't even invoked when the inner entity type equals that of the outer entity. That would also constitute a bug.
At the moment we don't observe these inconsistencies in the wild, only in cans that we modified inconsistently, so this is lower priority.
The
hits[].projects
values in a/index/{entity_type}
response come from one bundle per hit, and are not an aggregate from all the bundles for a given project.For example, imagine multiple bundles for a project, each adding a new file. Also imagine that each of these bundles has differing
project
metadata. The/index/files
response for this project will return with each hit having differenthits[].projects.…
values than the other hits.Ideally the
hits[].projects
values would be the same for all hits, and be an aggregate from all bundles for the project.The patch below modifies an existing test (
TestSchemaTestDataCannedBundle.test_project_cell_count
) and its canned bundles to demonstrate the issue.Console log: