RNAcentral / rnacentral-webcode

RNAcentral website source code
https://rnacentral.org
Apache License 2.0
31 stars 8 forks source link

Reimport Modomics data #129

Open AntonPetrov opened 7 years ago

AntonPetrov commented 7 years ago

At least one species-specific id shows several species at once (example):

screen shot 2017-02-22 at 09 30 14

The problem happens when the same sequence has the same modifications in multiple species (and the Accession model gets overwritten), so Modomics data needs to be reimported from scratch.

The problem was originally reported by Sean.

blakesweeney commented 7 years ago

There are at least 23 xrefs/accessions with this issue. We can find them by doing:

select
  *
from xref, rnc_accessions acc
where
  xref.ac = acc.accession
  and acc.database = 'MODOMICS'
  and (
    (species = 'Xenopus laevis' and taxid != 8355)
    or (species = 'Rattus norvegicus' and taxid != 10116)
    or (species = 'Zea mays' and taxid != 4577)
    or (species = 'Thermus thermophilus' and taxid != 274)
    or (species = 'Salmonella typhimurium' and taxid != 90371)
    or (species = 'Phaseolus vulgaris' and taxid != 3885)
    or (species = 'Oryctolagus cuniculus' and taxid != 9986)
    or (species = 'Triticum aestivum' and taxid != 4565)
  )
;

it appears to be limited to modomics as doing the search without the modomics constraint gives the same results.

Fixing the accessions can be done with:

-- Update Xenopus
update xref
  set taxid = 8355
where
  ac in ('dd7318229bd33f71098d491b437b97dd_modomics',
         '5b638f7a6fb817e74ea1fc05eb7aca6a_modomics')
  and taxid != 8355
;

-- Update rat
update xref
  set taxid = 10116
where
  ac in ('b5f224875fe4b55c0f8c79ff9e1c4b96_modomics')
  and taxid = 10090
;

-- Update maize
update xref
  set taxid = 4577
where
  ac in ('331f68e0cd1ed4d69e6ce052f24d432c_modomics')
  and taxid = 3562
;

-- Update thermus
update xref
  set taxid = 274
where
  ac in ('367fff4928ff6e45035eccd25315ae9d_modomics', 
         '3fe73d3ce3932d8042cec7866b809ac0_modomics')
  and taxid = 300852
;

-- Update Salmonella
update xref
  set taxid = 90371
where
  ac in ('3c9f4214774de3fd21eb099235728829_modomics')
  and taxid = 562
;

-- Update kidney bean
update xref
  set taxid = 3885
where
  ac in ('ae35295009e8a43132a40112333bbcc1_modomics')
  and taxid = 3847
;

-- Update rabbit
update xref
  set taxid = 9986
where
  ac in ('484a32153536ff19c456a10b99106d82_modomics',
         '2b7215b44be5fa48144480e645e1d4b1_modomics',
         '484a32153536ff19c456a10b99106d82_modomics')
  and taxid != 9986
;

-- Update wheat
update xref
  set taxid = 4565
where
  ac in ('1cc28fdd9201cb2ab75c52dd846b649f_modomics',
         '851cde158600ec1bb7cb41827451d795_modomics',
         '512c6fee503aae504bbab970efa856a1_modomics',
         '9499216fa6efa3c4f2b9a7bbb8dc1548_modomics',
         '1dd5a0399007891f1ec154143a319d21_modomics')
  and taxid != 4565
;