Export unmatched records

RDmitchell commented 9 years ago

Add ability in the program for the user to export unmatched records, such as PM records that didn't have associated master buildings or buildings which don't have associated PM records.

RDmitchell commented 8 years ago

@StephenCzarnecki assigned this task. @danielmcquillen can provide guidance on FE aspects as needed

RDmitchell commented 8 years ago

@pipermerriam might also be able to work on this. I will check with him.

RDmitchell commented 8 years ago

I have assigned this to @pipermerriam for now

RDmitchell commented 8 years ago

@StephenCzarnecki -- just reassigned this to you.

StephenCzarnecki commented 8 years ago

Summary: Seems to me there are two ways of getting this behavior: Modifying some existing code that is used several other places or writing a new function that is similar to the existing export_buildings method.

After finding that the search functionality already allows for adding more-or-less arbitrary filter parameters I have been thinking about this new feature as "Export any Building Snapshots that meet some criteria". That is very close but different from the existing export behavior (in export_buildings) which is "Export any non-deleted Canonical Building Snapshots that meet some criteria".

The reason why the current behavior is restricted to non-deleted canonical snapshots is due to a function that is called by the export_buildings method: get_search_query. get_search_query has the filtering on active canonical buildings hard-coded to true and so can currently only return non-deleted canonical buildings. In addition to the export function get_search_query is used in the following: delete_matching_buildings, get_transfer_buildings, delete_buildings, add_buildings, remove_buildings.

We can't just use this as-is because the data we are looking to export is not going to be in a canonical record. There is a good chance any canonical record contains data from a source that is not the import file currently selected in the review matches screen.

My initial thought was along the lines of: "Well, 'canonical' and 'not deleted' are just some other criteria. So why not just alter get_search_query to make that explicit and change its current clients to match?" I'm still suspect this is the right approach but will try to play devils advocate for a bit.

One issue with changing get_search_query is that it makes more things have to be conscious of the difference between a canonical building and a building snapshot. That doesn't exactly strike me as a step in the right direction. In general I get the impression that the "default" activity is dealing with canonical buildings and other stuff is the exception. But of course we could have canonical be the default. So that might not be as much of an issue.

There is also the case of the existing clients. Their behavior is based on get_search_query and any changes to that function to allow for exporting non-canonical records may affect those functions as well. Hopefully the test coverage will catch everything but the more behavior change the more that can go wrong.

On the other hand having one central search function would ensure that things that should work the same do. There wouldn't be a choice of what to use, just what parameters are set. Some defaults are set but can be overridden. Seems potentially easier to maintain in the future.

Some details about what each change would involve:

OPTION #1: Change get_search_query Make the "canonical vs building snapshot" difference explicit in get_search_query. We can do that and make the default behavior be to return canonical buildings by adding something like only_return_active_canonical=True to get_search_query.

Then we can change the function that currently is used to display the matched and unmatched buildings (search_building_snapshots) to use get_search_query. filter_other_params seems like it already handles import_file_id so that should hopefully just work.

Finally we should be able to add a button to the building matching screen that just calls export_buildings with the appropriate parameters.

OPTION #2: Add export_buildings_from_file api: 1: Create export_buildings_from_file method basically using search from search_building_snapshots. (search_building_snapshots is what does the querying for the matched/unmatched buildings) Use the results from that in the same way they are currently used in export_buildings. 2: Add front-end button to call export_buildings_from_file.

I'm sure I've missed out on some reasons and some steps (and quite possibly am just missing a better alternative). Maybe this is a larger issue and there is an even-more-general solution. Let me know if there are any better approaches or reasons to prefer one path to another.

StephenCzarnecki commented 8 years ago

@nicholasserra When I was looking at this issue I eventually made my way to filter_other_params and is_column where (I think) you specify the django queryset filtering suffixes (things like __lt). Wondering a couple things, not urgent or anything at the moment:

1: Was it intentional to restrict them to 'lt', 'gt', 'lte', 'gte' or was that just what was needed at the time?

2: In filter_other_params it seems like this block could be de-indented one to skip the "if in_columns..." check. I'm not sure but it seems like there should be a way to save clients from having to add fields that they know they are checking. Things like mappable_types['project__slug'] = 'string' in get_search_query or db_columns['children__isnull'] = '' in search_building_snap

Or I suppose it could be there for exactly that reason? Making sure the clients verify that they really want whatever custom filter?

nicholasserra commented 8 years ago

@StephenCzarnecki

I know a lot of that stuff was there before I was hacking on it, but I think I can answer some of this.

So in_column only references those 4 query modifiers because those are the only ones that are being used, and they're stripped off if found, so that we can tell if that field in question is a DB column or not. in_column is really just a check to see if the field in question is a database field, or a field in the extra_data json blob.

So the reason for the in_column check in general is to know whether to use database queries, or to use our custom json blob logic.

I'm not sure if that answers the question you have, as I had some trouble understanding #2 entirely. Let me know your thoughts.

StephenCzarnecki commented 8 years ago

@nicholasserra

What you wrote confirms what I saw. This may be a bit of a moot point at the moment since I think I ran into another problem in this approach. What I was trying to get across in my second point is that there are other django filtering things that are not just database fields.

For example search_building_snapshots adds "childrenisnull" to the list of mappable type and then queries BuildingSnapshot. But "children" is not a field in the BuildingSnapshot table. It is a many-to-many field in the model but not a field on BuildingSnapshot itself. So childrenisnull is really talking about if there are any rows in the buildingsnapshot_children table that django created.

The reason I am asking is when I was looking at ways of making get_search_query not search for only canonical buildings one idea I had was was making things like "canonicalbuilding__active=True" work as a "filter_param".

Since that is not one of the current things returned by get_mappable_types it looked like in addition to having a client pass that as a parameter I'd either need to add "canonicalbuildingactive" to the list of db_fields or add "canonicalbuilding" to the list of fields and change in_column to recognize "active". I'd also need to change the bit in filter_other_params that currently looks like

elif('lt' in k or 'lte' in k or ....

to include a line that says '__active' in k or

And then any other similar thing (e.g. __count) in the future the same would be needed to be done.

Not sure if that clears things up at all, hope it does.

nicholasserra commented 8 years ago

Right, so if you wanted to make those methods more generic, and able to filter on "non-column" filters, like m2m related names, we would have to expand our is_column sanitization to check for all of those weird cases.

But i'm not sure how much work that would take, as those methods seem pretty tied to Buildings, especially because filter_other_params does those checks in the extra_data json blolb.

There also may be better ways to do the is_column checks, that wouldn't require us to strip off suffixes, and would make it easier to make the filtering more generic. I think the whole filtering process is also complicated because we have so many operators and special characters that change the way we filter.

RDmitchell commented 8 years ago

We may move this to Buidling List, once @nicholasserra has show matched and unmatched records working there.

RDmitchell commented 8 years ago

For now, we will shelve this issue -- with the Matching/Unmatching functionality in the Building List which then allows the user to export those records, this may not be needed. Am "unassigning" it.

nllong commented 6 years ago

This doesn't seem valid anymore.

RDmitchell commented 6 years ago

This functionality would potentially be useful in some sort of "automated" form, but right now it is pretty straightforward to set up filters to determine which ESPM records matched and which didn't. So I think it's fine to close this.

SEED-platform / seed

Export unmatched records #127