DMPRoadmap / roadmap

DCC/UC3 collaboration for a data management planning tool
MIT License
102 stars 109 forks source link

Example answers displaying at wrong time and named incorrectly #381

Closed sjDCC closed 7 years ago

sjDCC commented 7 years ago

Example answers should just show for users with the same institutional affiliation. I created a DMP for ESRC + Strathclyde and saw a number of example answers. The URL in one suggests these all come from St Andrew's. They're recorded as ESRC though. escr-st-andrews-example

xsrust commented 7 years ago

I have confirmed that this is reflected in the database and not a problem with display

xsrust commented 7 years ago

With the Old Database: Before the migration

dmproadmap=# Select organisation_id from suggested_answers where text like "%School Computing Officer%";
 organisation_id
-----------------
         9009236
       850994612
       850994612
       850994612
(4 rows)

dmproadmap=# Select name from Organisations where id = 9009236 or id = 850994612;
                 name
--------------------------------------
 University of St Andrews
 Economic and Social Research Council
(2 rows)

This suggests that the migration is functioning correctly and this suggested answer is associated in one place to University of St Andrews, and in 3 places to ESRC.

sjDCC commented 7 years ago

Urgh - so maybe we have bad data...

I don't think any funders have example answers, only institutions. Could the institutional example answers have become funder examples inadvertently? For example if a new version of the funder template was released. Perhaps the institutional customisations come through on cloning but get saved as the template-owner affiliation?

Could this explain why multiple additional sections are showing on templates too e.g. in issue #371

Maybe I need to go through and clean up the DMPonline data? Can you give me a dump of all the example/suggested answers and affiliations, additional sections and guidance by question?

I don't think this will affect Tuuli as they don't have many example answers or customisations from recollection

xsrust commented 7 years ago

Viewing on DMPonline (as an org-admin for ESRC), this exact text exists on ESRC's Template as their suggested answer for the past 3 versions of the phase. Is this necessarily an error with the data or intentional?

sjDCC commented 7 years ago

Ok, this must be a data issue then. ESRC definitely doesn't have any example answers. I'm not sure when St Andrew's created them, but I assume on v1 of the ESRC template and then the cloning process to create a new version has caused them to belong to ESRC?

The final two examples under the sections "Copyright and IP" and "Responsibilities" are clearly from St Andrew's as their URLs are in the examples. I don't know how I can determine where the other ones have come from though - probably St Andrew's too? Is there anything in the history e.g. a user ID for who created them?

I think we need to go through the dump of example answers and try to untangle the data mix-up. When we figure out what has come from where, they should be re-associated with that org, but I can't see how to do that...

xsrust commented 7 years ago

there aren't user_id's for creation of backend(template/phase/version/section/suggested_answer/guidance/guidance_groups) structures. These just get associated with organisations. We have around 500 suggested answers(for the dmponline dataset), so i'm not sure how feasible it will be to go through them all manually.

sjDCC commented 7 years ago

Thanks @xsrust I spoke with @vyruss about this on Thursday and said the ones to highlight are those attached to funder templates. Can your query how many that brings it down to? I think I'll have to go through them manually regardless of the number.

Any suggested answers on institutional templates will be fine as they are the only people who can add them there. The template owner and suggested answer owner will always match, unlike with funder templates which can be customised.

vyruss commented 7 years ago

I'm getting a lengthy list of 941 SuggestedAnswers whose Organisation is a funder...

2.0.0-p247 :032 > SuggestedAnswer.where(organisation_id: Organisation.where(organisation_type_id: 754667110)).count
   (4.5ms)  SELECT COUNT(*) FROM `suggested_answers` WHERE `suggested_answers`.`organisation_id` IN (SELECT `organisations`.`id` FROM `organisations` WHERE `organisations`.`organisation_type_id` = 754667110)
 => 941 
xsrust commented 7 years ago

querying un-migrated dmponline data it looks worse, as jimmy just said

2.2.3 :022 > Dmptemplate.funders_templates.each do |t|
2.2.3 :023 >     t.phases.each do |p|
2.2.3 :024 >       p.versions.each do |v|
2.2.3 :025 >         v.sections.each do |s|
2.2.3 :026 >           s.questions.each do |q|
2.2.3 :027 >             sas += q.suggested_answers.count
2.2.3 :028?>             texts += q.suggested_answers.pluck(:text)
2.2.3 :029?>           end
2.2.3 :030?>         end
2.2.3 :031?>       end
2.2.3 :032?>     end
2.2.3 :033?>   end

to gather all suggested_answers associated with funder templates returns 1255 suggested answers attached to funder templates, 240 of which have unique text. Difference in numbers between Jimmy's and my queries can be accounted for by me also pulling in customizers suggested_answers

sjDCC commented 7 years ago

Could someone pull me those 240 with unique text and give me a csv or excel with 'example answer' 'question' 'template' 'organisation' and any other relevant fields you can think of

I'll try and piece together what the correct affiliation should be...

vyruss commented 7 years ago

Should we close as this is a legacy data error and does not affect Roadmap?

xsrust commented 7 years ago

did we determine if the cause was a mechanism of the old codebase or if it was due to somebody modifying data e.g. through super_admin interface?

Others who have deployed the DMPonline codebase may have a similar issue with their data if it was due to the the code around versioning. It may be worthwhile to suggest that this may be an issue to them

sjDCC commented 7 years ago

Given that the examples of bad data are all on templates that have multiple versions, I think this is due to poor code on the template cloning / versioning process in DMPonline v4. It's worth flagging to those who host their own instances to check their suggested answer data. Can we note it in the migration documentation?

Can one of you close this once the bad data has been cleaned / fixed to release DMPonline test please. It doesn't affect roadmap code.

vyruss commented 7 years ago

Done, data migrated.