SEED-platform / seed

Standard Energy Efficiency Data (SEED) Platform™ is a web-based application that helps organizations easily manage data on the energy performance of large groups of buildings.
Other
107 stars 55 forks source link

Renaming a column tanks server performance #4460

Open ebeers-png opened 8 months ago

ebeers-png commented 8 months ago

Describe the bug

When a user renames a column name in an org with many buildings, it kills our server's performance. Can you rewrite the code to use celery and/or reduce the strain on the server?

For reference, in a BEAM org of about 4500 property views and 631 columns, my local instance took 7.33 minutes to rename a column, and ran at 100% CPU for that whole period - I couldn't load BEAM in another webpage until it finished.

haneslinger commented 8 months ago

I'm happy to investigate. Was it an extra data column or a canonical one?

ebeers-png commented 8 months ago

Thanks Hannah! This one was an extra data column.

haneslinger commented 8 months ago

Alright, so here's the skinny:

The POST /columns/{id}/rename endpoint is slow because we are doing an individual edit and save on each poroperty/taxlot. One would hope we could leverage update() to turn O(N) queries to O(1), but alas, update() doesn't call the pre/post save functions which is rather important.

HOWEVER, one could use the PUT /columns/{id} endpoint to simply change the displayname a column, which sorta kinda does the same thing as UI users are concerned, without all the db yuckiness.

Why we have both upsetting similar endpoints, idk, I have to imagine there is a historical reason. @nllong?

@ebeers-png, Is there a reason you are using POST /columns/{id}/rename over PUT /columns/{id}?

ebeers-png commented 8 months ago

Hmm, it's a good question, whether or not ordinary users have an actual need to change the column name...

In BEAM's case, we have scripts that use a list of hardcoded column names, so if a column name is misspelled for whatever reason then we would use POST /columns/{id}/rename to update that. But for a regular user, I can't think of any other reason at the moment. Technically you can use POST /columns/{id}/rename to combine columns, right?

haneslinger commented 8 months ago

Technically you can use POST /columns/{id}/rename to combine columns, right?

You would think, but looking at the implementation, I think it's a full overwrite.

@axelstudios agrees that this endpoint should be repurposed to be a combine or something.

ebeers-png commented 8 months ago

A combine endpoint would be useful!

We could also really use a column copy endpoint, that would copy property data from one column to another! Same functionality as POST /columns/{id}/rename, just subtract where it creates a new column and deletes the old data.

RDmitchell commented 8 months ago

@ebeers-png / @haneslinger -- users definitely need to be able to change column names.

So we need to make sure that functionality is available.

haneslinger commented 8 months ago

@ebeers-png / @haneslinger -- users definitely need to be able to change column names.

So we need to make sure that functionality is available.

@RDmitchell Can you explain why it's important for users to be able to not just change column display names, but also column names?

RDmitchell commented 8 months ago

@haneslinger -- ah ... I guess I didn't understand the functionality. I was thinking of display names.

But maybe we need to allow them to change the actual field name itself?

The issue is that mapping fields on import is a critical step in getting everything set up properly in SEED.

Sometimes, users don't do the mapping properly and fields that should be mapped to one field are mapped incorrectly.

And it is a big deal to try to straighten things out.

I'm not sure having users rename the internal field names is the best way to address this, but it might be .. ???

Could we have a discussion about this at the next iteration meeting to see what @nllong and the other programmers think?

I know it is probably a very big deal to remap internal field names, but I think we should talk about the possibility that users may need to do it.

haneslinger commented 8 months ago

I don't think this has anything to do with column mapping, but I could be wrong.

Could we have a discussion about this at the next iteration meeting to see what @nllong and the other programmers think? Definitely. There's some history here I'm not privy to.

RDmitchell commented 8 months ago

Well, when a file is imported, you map the fields from the file being imported to the fields in SEED, sometimes the fields that are part of the database and sometimes to extra data fields.

If you get that mapping wrong, it's hard, if not impossible to fix.

I guess I was thinking that actually changing the field name that was mapped might be a solution

But maybe I am wrong.

I guess that is why I would like to have a discussion with others about it.

ebeers-png commented 8 months ago

Well, when a file is imported, you map the fields from the file being imported to the fields in SEED, sometimes the fields that are part of the database and sometimes to extra data fields.

When manually importing a file, you do use the display names to map the fields, not the column names.

If SEED has a "copy column" endpoint, then you could use it to recreate the functionality of changing a column name - create a new column, copy the data over, and delete the old column. Though I guess you'd be making the same calls to the backend twice, once for copying and once for deleting.

That being said, doesn't column deletion already use celery to remove data from buildings? You can use that as a base for whatever rewrites you make to this function.

haneslinger commented 8 months ago

That being said, doesn't column deletion already use celery to remove data from buildings? You can use that as a base for whatever rewrites you make to this function.

Totally, if we end up keeping/modding POST /columns/{id}/rename, it's going in a celery task.

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity within 60 days. It will be closed if no further activity occurs. Thank you for your contributions.

haneslinger commented 6 months ago

@kflemin @nllong, I thought I addressed this but I guess not.... where did we land on this? I think just removing "rename columns" from the page?

RDmitchell commented 6 months ago

@kflemin / @nllong -- I think we want to keep the option for users to rename columns, don't we?

RDmitchell commented 6 months ago

@haneslinger -- I guess I was just talking about column display names.

I think you are right, that users shouldn't need to change the actual field name, and probably shouldn't be allowed to because it might mess up their data mapping world?