Closed asaunier closed 7 years ago
Temporary workaround applied in the demo API code so that the migration passes:
diff --git a/c2corg_api/models/image.py b/c2corg_api/models/image.py
index 647d88c..03b9bef 100644
--- a/c2corg_api/models/image.py
+++ b/c2corg_api/models/image.py
@@ -49,8 +49,7 @@ class _ImageMixin(object):
@declared_attr
def filename(self):
return Column(String(30),
- nullable=False,
- unique=(self.__name__ == 'Image'))
+ nullable=False)
Is there a chance that this issue is fixed for alpha3? I wanted to reimport the demo DB but I notice we still have this problem. If it is about to be fixed, I will wait before running the migration script. Else I could do it now.
I do not have worked on this for now. I'm pretty sure you can reimport it now.
Should we do something about this issue?
We should check which images have this problem, and why. Maybe it could be a simple manual fix in the old database.
DETAIL: Key (filename)=(1279011081_1740341383.jpg) already exists.
c2corg=# select id, filename from images where filename = '1279011081_1740341383.jpg';
id | filename
--------+---------------------------
227183 | 1279011081_1740341383.jpg
227184 | 1279011081_1740341383.jpg
(2 rows)
It seems that 2 image documents share the same filename: http://www.camptocamp.org/images/227183/fr/0007- http://www.camptocamp.org/images/227184/fr/0007- => 2 distinct documents (created with an interval of a few seconds) but both pointing to http://s.camptocamp.org/uploads/images/1279011081_1740341383.jpg
This is not an isolated case. In our copy of the v5 DB
select * from (
SELECT id,
filename,
ROW_NUMBER() OVER(PARTITION BY filename ORDER BY id asc) AS row
FROM images
) dups
where
dups.row > 1;
returns many duplicates (sometimes there even are up to 7 documents with the same filename!).
The image documents having the same name, I assume this is due to a problem with the upload mechanism. It looks pretty safe to me to delete duplicates (maybe some associations could be lost, but this seems rather insignificant to me and will most certainly not occur)
We still have this problem (the migration script has just crashed on this error because I had forgotten the workaround mentioned above).
@stef74 @gottferdom do you think we could avoid those errors in the v5 DB?
@arnaud-morvan Do we really need this unique constraint in the v6 model? Perhaps it will cause the file (including for other image documents that share it) to be deleted when the image removal service will be implemented?
Like I said, at first glance it seems quite safe to remove these duplicates in V5. Problem is, we don't know how they occur and thus doing a pass now doesn't ensure to have a DB in correct status when migrating data at a later time. Stupid question: could this V5 DB modification (assuming we can do it safely) be included in the begining of the migration script?
I would prefer the changes are done in v5, including the addition of a "unique" constraint on the filename attribute.
I've removed the unique contraint on the filename to automate the migration process. This could possibly be reverted later.
@arnaud-morvan can we close this now ?
We might want to get back the contraint when V5 duplicated images will be threated.
I think it's better to keep the issue open until the golive, because it depends on actions of the associations on the v5 DB.
What am I supposed to do? 😀
Running the detection script juste before the readonly and removing duplicates via v5 should be sufficient IMHO
@lbesson exactly! :)
Could you confirm that existing duplicates have already been handled?
It would probably be good if we already could test that the script works as expected with our migration script.
@asaunier I can't confirm if all existing duplicates have been handled on production db, since some new could have been created since I last checked againts them (we haven't corrected the bug that created duplicates). But I can (almost - never say never) confirm that the procedure (run script to get duplicates list, then use v5 interface to delete them without deleting underlying image file) is ok
@tsauerwein It's not a script to run against a database or db dump, just some extra code in v5 to make sure that when deleting an image document, we don't delete the attached image file if the image is in fact a duplicate. So I'm unsure how this could be tested against your migration code (apart from removing remaining dups from db, then making a dump..)
I just wanted to make sure there wasn't a bunch of duplicates still to handle :) I know that some new duplicates may be created in the meantime and that handling them when v5 is made read-only might be needed.
@lbesson we have plan to block v5 in read only monday 5th dec and go live the 8th. We have to fix duplicated before the 6th
handling them
whenjust before v5 is made read-only might be needed
remember that handling the duplicates is done via v5 web interface :)
You mean that manual actions must be done in the web interface?
Note that we need the V5 dump as soon as possible for running the migrations. We have 3 migrations to do (topoguide, forum and images), passing it to prod environment, and some controls and eventually some fixes after that, before the GoLive. Can we plan to use the sunday to monday nightly V5 dump to start migration on monday morning. I would like the V5 to be set read only on sunday evening.
yes, you first list the duplicates via a db script, and then use the web interface to delete them as moderator. (note that it is preferable to delete the duplicate with the highest id since it is without image meta information. That is, if you have duplicates 12345 / 12346 (ids are not always consecutive, sometimes you have more than two documents referencing the same image file), you should go to the image document 12346 on v5 web interface and delete it.
On Wed, Nov 30, 2016 at 2:51 PM, Alex Saunier notifications@github.com wrote:
You mean that manual actions must be done in the web interface?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/c2corg/v6_api/issues/363#issuecomment-263877915, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHzIIdA5-HO6IdtE15IiArthN8UTGUvks5rDX9TgaJpZM4JiHB_ .
There were no duplicates in v5 prod DB yesterday night before when v5 was made read-only I think. Not this morning either. Should be OK, I close the issue. Thank you very much @lbesson !
When running the v5>v6 migration script, the latter crashes because
Seems related to that PR: https://github.com/c2corg/v6_api/pull/331