epam / NGB

New Genome Browser (NGB) - a Web - based NGS data viewer with unique Structural Variations (SVs) visualization capabilities, high performance, scalability, and cloud data support
MIT License
156 stars 43 forks source link

Motifs search #526

Open NShaforostov opened 2 years ago

NShaforostov commented 2 years ago

Background

In NGB, we can search for a specific nucleotide sequence in different references using BLAST. But BLAST is complicated to deploy/configure and takes a lot of time to process a request. For smaller search tasks, e.g. to perform a search for a specific nucleotide sequence in a certain reference genome, it might make sense to implement a "built-in" search engine. For example, such search is used to find motifs. Sequence motifs are short, recurring patterns in DNA that are presumed to have a biological function.

Approach

We should provide the ability for users to:

Start search

Add a new item in the dropdown menu "General" for the reference track, e.g. "Motifs search". To start the motifs search, user should click that item in the menu: image

Search form

After that, the search pop-up appears, e.g.: image

It should contain:

User should:

Results

After that, the search shall be performed over the opened chromosome/reference:

When the search is finished - a table should appear in the panel "MOTIFS". A table with the specific search result details (all found motif matches) should contain columns:

Also, for the motif search result details shall be shown:

image

User shall have the ability to click the row in this details table - and view selected search results at special tracks. Note: for the user convenience, the row under the mouse pointer shall be highlighted. These tracks shall automatically appear in the "Browser" panel simultaneously with the "MOTIFS" panel in case of successful motifs search. There should be two tracks - for results on forward and reverse strands:

Example of the result: image

If user clicks the button above the table with the search details - he(she) shall be navigated to the table with full list of motifs searches. This table should contain columns:

In this table, search results should be shown aggregated by search tasks. I.e. each new motifs search shall add to this table a new row, e.g.: image User shall have the ability to click the row in this table - and open the table with details of the certain search result. Note: for the user convenience, the row under the mouse pointer shall be highlighted.

Other options

Switching and store

Quick results navigation

Motif search results tracks should contain navigation buttons in their header - to quick opening of the next or previous result at the track (similar to the current navigation at VCF tracks): image

Opening from BAM tracks

User should have the ability to open the motif search from the BAM track as well:

Filters for results table

Users shall have the ability to filter motif search result details table. For that, add the Filters panel above the details table (under the header):

image

If user selects any filter value:

Notes:

  • user shall have the ability to set several filters simultaneously
  • by click the clear filters button - all filters should be reset
sidoruka commented 2 years ago
SilinPavel commented 2 years ago

For the server side the next realization steps is proposed:

Two different methods to search motifs:

SilinPavel commented 2 years ago

@mzueva @rodichenko @AlfiyaRF @sidoruka As discussed we agreed to additionally add the next features:

Tatyana2022 commented 2 years ago

Bug was found: Errors appear in the console when switching from one chromome to another in the panel details

Prerequisites: Open console (Ctrl+F12)

To reproduce:

  1. Go to NGB
  2. Go to DATASETS panel
  3. Select sample1 dataset
  4. Go to BROWSER
  5. Select any chromosome
  6. Go to REFERENCE track
  7. Select Motifs search in General dropdown menu
  8. Enter CACTGAAACAAAGGGACTGCAGATG sequence in PATTERN field
  9. Set checkbox in Search whole reference field
  10. Click Search button
  11. Go to MOTIFS panel
  12. Click on last search
  13. Open all results
  14. Look at console

Expected result: No errors in the console

Actual result: Errors display in the console:

app.bundle.js:333 Uncaught (in promise) Error: Search position: '181,424,742' is out of range
    at app.bundle.js:333:17331
    at tryCatcher (app.bundle.js:327:17036)
    at Promise._settlePromiseFromHandler (app.bundle.js:327:10401)
    at Promise._settlePromise (app.bundle.js:327:11694)
    at Promise._settlePromise0 (app.bundle.js:327:12736)
    at Promise._settlePromises (app.bundle.js:327:14313)
    at _drainQueueStep (app.bundle.js:327:25948)
    at _drainQueue (app.bundle.js:327:25844)
    at Async._drainQueues (app.bundle.js:327:27857)
    at Async.drainQueues (app.bundle.js:327:25466)
app.bundle.js:333 Uncaught (in promise) Error: Search position: '170,700,313' is out of range
    at app.bundle.js:333:17331
    at tryCatcher (app.bundle.js:327:17036)
    at Promise._settlePromiseFromHandler (app.bundle.js:327:10401)
    at Promise._settlePromise (app.bundle.js:327:11694)
    at Promise._settlePromise0 (app.bundle.js:327:12736)
    at Promise._settlePromises (app.bundle.js:327:14313)
    at _drainQueueStep (app.bundle.js:327:25948)
    at _drainQueue (app.bundle.js:327:25844)
    at Async._drainQueues (app.bundle.js:327:27857)
    at Async.drainQueues (app.bundle.js:327:25466)
app.bundle.js:1034 Uncaught (in promise) TypeError: Cannot set properties of null (setting 'data')
    at MOTIFSTrack.<anonymous> (app.bundle.js:1034:19507)
    at tryCatch (app.bundle.js:29:13733)
    at GeneratorFunctionPrototype._invoke (app.bundle.js:29:16222)
    at GeneratorFunctionPrototype.prototype.<computed> [as next] (app.bundle.js:29:14017)
    at step (app.bundle.js:1034:15636)
    at app.bundle.js:1034:15779

@mzueva @AlfiyaRF @rodichenko

Verified

Tatyana2022 commented 2 years ago

Bug was found: Long search motif name is not fully displayed in the details panel

Prerequisites: Sequence = CTTGATCTTCCCTGTGATGTCATCTGGAGCCCTGCTGCTTGCGGTGGCCTATAAAGCCTCCTAGTCTGGCTCCAAGGCCTGGCAGAGTCTTTCCCAGGGAAAGCTACAAGCAGCAAACAGT

To reproduce:

  1. Go to NGB
  2. Go to DATASETS panel
  3. Select sample1 dataset
  4. Go to BROWSER
  5. Select CHR: 1
  6. Go to REFERENCE track
  7. Select Motifs search in General dropdown menu
  8. Enter sequence from Prerequisits in PATTERN field
  9. Click Search button
  10. Go to MOTIFS panel

Expected result: The entire sequence should be displayed in the name of the panel search (there is a line break if the name is too long)

Actual result: The sequence is not wrapped to another line and is visually truncated (the same behavior with long title name)

@AlfiyaRF @mzueva

Verified

maryvictol commented 2 years ago

Bug was found: Attempt to interrupt search gives empty rows in the Search Results table and error message.

To reproduce:

  1. Go to NGB
  2. Go to DATASETS panel and select sample1 dataset
  3. Go to BROWSER
  4. Select any chromosome
  5. Go to REFERENCE track
  6. Select Motifs search in General dropdown menu
  7. Enter CACTGAAACAAAGGGACTGCAGATG sequence in PATTERN field
  8. Enter any value into the Title field
  9. Click Search button
  10. Go to MOTIFS panel
  11. During Loading Results click arrow < to interrupt search and return to the Search Results table

Expected result Search is interrupted without errors.

Actual result:

@mzueva @AlfiyaRF @rodichenko

Verified

maryvictol commented 2 years ago

@mzueva Bug was found: Only 1st page of results (100 records) is shown when motifs search details is opened from search results table.

To reproduce:

  1. Go to NGB
  2. Go to DATASETS panel and select sample1 dataset
  3. Go to BROWSER
  4. Select any chromosome
  5. Go to REFERENCE track and select Motifs search in General dropdown menu
  6. Enter AACWWRY sequence in Pattern field
  7. Click Search button
  8. Go to MOTIFS panel
  9. Scroll results to check that results list includes more then 100 records
  10. Click < button near the motif name to return to the Search Results table
  11. In this table click the row corresponding search from step 7 to open the table with search result details

Expected result Search result details table opens. All results are shown in the table.

Actual result: Only 1st page of results (100 records) is shown when motifs search details is opened from search results table.

Verified

maryvictol commented 2 years ago

@rodichenko Bug was found: Positive and negative tracks aren't shown after search in the Browser panel if return from the previous search results to the Search Tasks table using the < button.

To reproduce:

  1. Go to NGB
  2. Go to DATASETS panel and select sample1 dataset
  3. Go to BROWSER
  4. Select CHR:1 chromosome
  5. Go to REFERENCE track and select Motifs search in General dropdown menu
  6. Enter TGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGG sequence in Pattern field
  7. Click Search button
  8. Go to MOTIFS panel
  9. Click < button near the motif name to return to the Search Results table
  10. Go to REFERENCE track and select Motifs search in General dropdown menu
  11. Enter AAAGATGAGTGAGAGCATCAACTTCTCTCACAACCTAGGCCAG sequence in Pattern field
  12. Click Search button

Expected result Two corresponding tracks (positive and negative) appeare in the Browser panel (under the REFERENCE track).

Actual result: Two corresponding tracks (positive and negative) don't appeare in the Browser panel (under the REFERENCE track).

Verified

Tatyana2022 commented 2 years ago

Bug was found: Sequence color remains blue despite changing it in color configuration for positive and negative tracks

To reproduce:

  1. Go to NGB
  2. Go to DATASETS panel and select sample1 dataset
  3. Go to BROWSER
  4. Select CHR:1 chromosome
  5. Go to REFERENCE track and select Motifs search in General dropdown menu
  6. Enter TGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGG sequence in Pattern field
  7. Click Search button
  8. Go to positive track
  9. Open General->Color
  10. Change color (e.g. green)
  11. Click Save button
  12. Look at positive track

Expected result: Sequence displays in chosen color (e.g. green) for positive track

Actual result: Sequence color has not changed, remains blue by default

Extra details: The same behavior for negative track

@mzueva @rodichenko @AlfiyaRF Verified

Tatyana2022 commented 2 years ago

Bug was found: No positive and negative tracks in BROWSER panel and MOTIFS panel doesn't reopen after first closing

To reproduce:

  1. Go to NGB
  2. Go to DATASETS and select sample1 dataset
  3. Go to BROWSER
  4. Select CHR:1 chromosome
  5. Go to REFERENCE track and select Motifs search in *General dropdown menu
  6. Enter TGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGG sequence in Pattern field
  7. Click Search button
  8. Close MOTIFS panel
  9. Repeat 5-7 steps

Expected result: MOTIFS panel should be opened first at the right side in the additional panels. Positive and negative tracks should be displayed in the BROWSER panel

Actual result: MOTIFS panel didn't open in the NGB. No Positive and Negative tracks in the BROWSER panel related to the last search

Extra details: Search in the network is performed The same behavior for Motifs search from BAM

@mzueva @rodichenko @AlfiyaRF Verified

Tatyana2022 commented 2 years ago

Improvements: After conversation with @mzueva it was decided to increase the height of the Color Configuraton window so that scrolling is not displayed and the palette is not displayed under the Reset to defaults button

@rodichenko @AlfiyaRF Verified

Tatyana2022 commented 2 years ago

Bug was found: Errors appear in the console when switching from one chromome to another in the panel details

Prerequisites: Open console (Ctrl+F12)

To reproduce:

  1. Go to NGB
  2. Go to DATASETS panel
  3. Select sample1 dataset
  4. Go to BROWSER
  5. Select any chromosome
  6. Go to REFERENCE track
  7. Select Motifs search in General dropdown menu
  8. Enter CACTGAAACAAAGGGACTGCAGATG sequence in PATTERN field
  9. Set checkbox in Search whole reference field
  10. Click Search button
  11. Go to MOTIFS panel
  12. Click on last search
  13. Open all results
  14. Look at console

Expected result: No errors in the console

Actual result: Errors display in the console:

app.bundle.js:333 Uncaught (in promise) Error: Search position: '181,424,742' is out of range
    at app.bundle.js:333:17331
    at tryCatcher (app.bundle.js:327:17036)
    at Promise._settlePromiseFromHandler (app.bundle.js:327:10401)
    at Promise._settlePromise (app.bundle.js:327:11694)
    at Promise._settlePromise0 (app.bundle.js:327:12736)
    at Promise._settlePromises (app.bundle.js:327:14313)
    at _drainQueueStep (app.bundle.js:327:25948)
    at _drainQueue (app.bundle.js:327:25844)
    at Async._drainQueues (app.bundle.js:327:27857)
    at Async.drainQueues (app.bundle.js:327:25466)
app.bundle.js:333 Uncaught (in promise) Error: Search position: '170,700,313' is out of range
    at app.bundle.js:333:17331
    at tryCatcher (app.bundle.js:327:17036)
    at Promise._settlePromiseFromHandler (app.bundle.js:327:10401)
    at Promise._settlePromise (app.bundle.js:327:11694)
    at Promise._settlePromise0 (app.bundle.js:327:12736)
    at Promise._settlePromises (app.bundle.js:327:14313)
    at _drainQueueStep (app.bundle.js:327:25948)
    at _drainQueue (app.bundle.js:327:25844)
    at Async._drainQueues (app.bundle.js:327:27857)
    at Async.drainQueues (app.bundle.js:327:25466)
app.bundle.js:1034 Uncaught (in promise) TypeError: Cannot set properties of null (setting 'data')
    at MOTIFSTrack.<anonymous> (app.bundle.js:1034:19507)
    at tryCatch (app.bundle.js:29:13733)
    at GeneratorFunctionPrototype._invoke (app.bundle.js:29:16222)
    at GeneratorFunctionPrototype.prototype.<computed> [as next] (app.bundle.js:29:14017)
    at step (app.bundle.js:1034:15636)
    at app.bundle.js:1034:15779

@mzueva @AlfiyaRF @rodichenko

Verified

@AlfiyaRF @mzueva Bug reproduces again for another sequence: gggttcatgaggaagggcaggaggagggtgtgggatggtg

Verified

Tatyana2022 commented 2 years ago

Bug was found: Positive or negative track is not reopened if a record with the corresponding strand is selected in the table

To reproduce:

  1. Go to NGB
  2. Go to DATASETS panel and select sample1 dataset
  3. Go to BROWSER panel
  4. Set CHR: 1
  5. Go to REFERENCE track
  6. Select Motifs search in General dropdown menu
  7. Set TATA sequence in PATTERN field
  8. Click Search button
  9. Close TATA_POSITIVE track in BROWSER panel
  10. Go to MOTIFS details table
  11. Click on any row with positive strand (e.g. first row)
  12. Look at BROWSER panel

Expected result: Corresponding TATA_POSITIVE track should be displayed in BROWSER panel ( TATA_NEGATIVE track also remains open).

Actual result: No TATA_POSITIVE track in the BROWSER. Only negative track is displayed in the BROWSER panel

Extra details: If a negative strand was selected in the MOTIFS panel, then the negative one should be redrawn according to the selected row and the positive track should not be opened in the current case

@AlfiyaRF

Verified

Tatyana2022 commented 2 years ago

Bug was found: No more than one pair of Motif's tracks with duplicate titles is displayed in the BROWSER panel

To reproduce:

  1. Go to NGB
  2. Go to DATASETS panel and select sample1 dataset
  3. Go to BROWSER panel
  4. Set CHR: 1
  5. Go to REFERENCE track
  6. Select Motifs search in General dropdown menu
  7. Set CAA sequence in PATTERN field
  8. Click Search button
  9. Repeat 5-8 steps
  10. Look at BROWSER panel

Expected result: 2 pairs of CAA_POSITIVE and CAA_NEGATIVE tracks should be displayed in the BROWSER panel for each launched search

Actual result: Only one pair of Motif tracks (positive and negative) displays in the BROWSER panel after search

Extra details: The same behavior for all searches with duplicate titles

@mzueva @AlfiyaRF @rodichenko

Verified