NAL-i5K / general_issues

for issues and discussions not tied to a specific repository
2 stars 0 forks source link

Genome assembly update: Drosophila kikkawai #172

Closed mpoelchau closed 3 years ago

mpoelchau commented 3 years ago

This is a fairly straightforward genome assembly update. See https://gitlab.com/i5k_Workspace/workspace_roadmap/-/wikis/Adding-an-organism-CWL-update for full description of each task (requires gitlab login)

Rest: Monica and Surya will divide up.

suryasaha commented 3 years ago

Works fine on stage but getting this error on prod for the genome assembly. All analysis files were processed fine

(i5k) [i5k@i5k-node1 ~]$ python manage.py blast_utility /usr/local/i5k/media/blast/db/GCF_018152535.1_ASM1815253v1_genomic.fna -m
Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/i5k/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/usr/local/i5k/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/i5k/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/i5k/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/app/local/i5k/blast/management/commands/blast_utility.py", line 18, in handle
    blast = BlastDb.objects.get(title = title)
  File "/usr/local/i5k/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/i5k/lib/python3.6/site-packages/django/db/models/query.py", line 408, in get
    self.model._meta.object_name
blast.models.DoesNotExist: BlastDb matching query does not exist.
mpoelchau commented 3 years ago

@suryasaha looks like the genome fasta file is not loaded, you can probably just redo that command, or let me know if there's a problem with the load. (Per the droeug issue just use the name for the fasta file, not the full path):

django=# select app_organism.short_name,blast_blastdb.id,title from blast_blastdb inner join app_organism on ( blast_blastdb.organism_id = app_organism.id ) where short_name = 'drokik';
 short_name | id  |                       title                       
------------+-----+---------------------------------------------------
 drokik     | 518 | GCF_018152535.1_ASM1815253v1_cds_from_genomic.fna
 drokik     | 516 | GCF_018152535.1_ASM1815253v1_rna_from_genomic.fna
 drokik     | 517 | GCF_018152535.1_ASM1815253v1_translated_cds.faa
 drokik     |  33 | Dkik02082011-genome.fa
 drokik     | 278 | GCF_000224215.1_Dkik_2.0_rna.fna
 drokik     | 270 | GCF_000224215.1_Dkik_2.0_genomic.fna
 drokik     | 286 | GCF_000224215.1_Dkik_2.0_rna_from_genomic.fna
 drokik     | 302 | GCF_000224215.1_Dkik_2.0_protein.faa
 drokik     | 294 | GCF_000224215.1_Dkik_2.0_cds_from_genomic.fna
 drokik     |  34 | DKIK.fna
 drokik     |  35 | DKIK.faa
suryasaha commented 3 years ago

Loading genome without full path as suggested

(i5k) [i5k@i5k-node1 ~]$ python manage.py addblast Drosophila kikkawai -t nucleotide Genome Assembly -f GCF_018152535.1_ASM1815253v1_genomic.fna -d 'Drosophila kikkawai genome assembly, ASM1815383v1'
you can move to makeblastdb and populate sequence step

(i5k) [i5k@i5k-node1 ~]$ python manage.py blast_utility GCF_018152535.1_ASM1815253v1_genomic.fna -m
1 species finished
all done
(i5k) [i5k@i5k-node1 ~]$ python manage.py blast_utility GCF_018152535.1_ASM1815253v1_genomic.fna -p
1 species finished
all done

(i5k) [i5k@i5k-node1 ~]$ python manage.py blast_shown GCF_018152535.1_ASM1815253v1_genomic.fna --shown true
1 species finished
all done

Commands work but I don't see the new data sets on prod image

mpoelchau commented 3 years ago

The genome assembly now shows up. It looks like the annotation fastas aren't set to 'is shown'. I'd guess you should run those commands again.

django=# select app_organism.short_name,blast_blastdb.id,title,is_shown from blast_blastdb inner join app_organism on ( blast_blastdb.organism_id = app_organism.id ) where short_name = 'drokik';
 short_name | id  |                       title                       | is_shown 
------------+-----+---------------------------------------------------+----------
 drokik     | 519 | GCF_018152535.1_ASM1815253v1_genomic.fna          | t
 drokik     | 518 | GCF_018152535.1_ASM1815253v1_cds_from_genomic.fna | f
 drokik     | 516 | GCF_018152535.1_ASM1815253v1_rna_from_genomic.fna | f
 drokik     | 517 | GCF_018152535.1_ASM1815253v1_translated_cds.faa   | f
 drokik     |  33 | Dkik02082011-genome.fa                            | f
 drokik     | 278 | GCF_000224215.1_Dkik_2.0_rna.fna                  | t
 drokik     | 270 | GCF_000224215.1_Dkik_2.0_genomic.fna              | t
 drokik     | 286 | GCF_000224215.1_Dkik_2.0_rna_from_genomic.fna     | t
 drokik     | 302 | GCF_000224215.1_Dkik_2.0_protein.faa              | t
 drokik     | 294 | GCF_000224215.1_Dkik_2.0_cds_from_genomic.fna     | t
 drokik     |  34 | DKIK.fna                                          | t
 drokik     |  35 | DKIK.faa                                          | t
(12 rows)
suryasaha commented 3 years ago

Loading genome without full path as suggested

(i5k) [i5k@i5k-node1 ~]$ python manage.py addblast Drosophila kikkawai -t nucleotide Genome Assembly -f GCF_018152535.1_ASM1815253v1_genomic.fna -d 'Drosophila kikkawai genome assembly, ASM1815383v1'
you can move to makeblastdb and populate sequence step

(i5k) [i5k@i5k-node1 ~]$ python manage.py blast_utility GCF_018152535.1_ASM1815253v1_genomic.fna -m
1 species finished
all done
(i5k) [i5k@i5k-node1 ~]$ python manage.py blast_utility GCF_018152535.1_ASM1815253v1_genomic.fna -p
1 species finished
all done

(i5k) [i5k@i5k-node1 ~]$ python manage.py blast_shown GCF_018152535.1_ASM1815253v1_genomic.fna --shown true
1 species finished
all done

Commands work but I don't see the new data sets on prod

mpoelchau commented 3 years ago

@suryasaha looks like the description (which is what shows up in the app UI) is incorrect (note I'm just showing the query results from the genome fasta below for clarity):

django=> select app_organism.short_name,blast_blastdb.id,title,blast_blastdb.description from blast_blastdb inner join app_organism on ( blast_blastdb.organism_id = app_organism.id ) where short_name = 'drokik';
 short_name | id  |                       title                       |                                      description                     

------------+-----+---------------------------------------------------+----------------------------------------------------------------------
------------------
 drokik     | 519 | GCF_018152535.1_ASM1815253v1_genomic.fna          | Drosophila kikkawai genome assembly, ASM1815383v1
 ...

I will update the description in the database.

mpoelchau commented 3 years ago

I updated the description for the genome for this organism on both stage and prod. @suryasaha can you please verify?

suryasaha commented 3 years ago

Haha.. so did I. Looks good now. Linkouts to jbrowse work fine too @mpoelchau

suryasaha commented 3 years ago

No copyright free image for organism page @mpoelchau This might work no clear CC license terms are listed https://bugguide.net/node/view/642813/bgpage