Closed dondi closed 1 year ago
@mihirsamdarshi will move to this after finishing #475 and #798.
@kdahlquist do you have an example file for me to use while working on this?
I've made one quickly, but haven't had a chance to test it with GRNmap yet. I added a row for "species" in the "optimization_parameters" sheet and put "Saccharomyces cerevisiae" in there. I added a row called "taxon_id" and put 559292 in the field. Note that on a previous issue, we discussed that for yeast, Uniprot expects taxon ID 559292 for yeast, while JASPAR expects taxon ID 4932 for yeast and @jlopez616 did something to resolve this, but I don't remember what. I think we will need to discuss this further at the meeting tomorrow and yeast might not be the best species to use to test this.
15-genes_28-edges_db5_Dahlquist-data_input_with-species.xlsx
we actually parse the entire sheet, and add it to the network metadata, so this was relatively simple to add, and is a part of the #796 PR
The complicated part is the testing. I'm torn as to whether to require the user to just look up the taxon ID or to let them use the genus/species name. There is always the problem of misspelling or choosing the wrong ID.
I spent a lot of time going through the code, and documenting the pieces that I need with both comments and more info with notes on my iPad. I created an object to serve as the name/taxon/id list. It's similar to a dictionary with a key: value where the key is the simple english name and the value is another object "dictionary" where the key is variable name and the value is the respective value for the var name and english name. What I have done to work around the issue with Jaspar and Uniprot to have two different variables, which can be the same or different depending on what is needed. I also wrote a method for reading in and setting the values for the GRNstate. It runs for both the taxon id or species name, which allows for some user error in the input from the workbook. The workbook can also be expanded to allow for a Jaspar and Uniprot id. This is where tests will need to be written and I will need help doing that. I also ran into a problem with the current test. I can leave the GRNstate vars set to yeast to allow the tests to pass, but I don't think this is proper technique. What I want to discuss is how we should update our tests and possible sample data when we have the ability to read in the species. Does it become mandatory, or a feature that may or may not be used. This will all change how the tests are written and handled. Currently, using the sample data from this thread, I can say that my method for setting worked.
To answer one question, uploading the species name/id is not mandatory.
One thing we talked about was the form of the species name that is supplied by the user in the GRNmap workbook. Since the species name has a space character in between the genus and species name and the API queries fill this space with an underscore, we talked about where the underscore needs to be added.
@dondi suggests that the conversion be done at the point we call the API, in case we encounter API's that want a different format.
Let's create an info box right about the layout box on the left hand side of the interface so there can be a passive indicator to the user which species is in use. We talk about when to actively notify the user later.
Discussed having a "Data" menu item for both @afiller1 and @Kevboe 's features, but it is something that is accessed more rarely so it should not take up prime real estate.
For now, let's stick with defaulting to yeast initially.
@Kevboe please aim to issue a pull request showing this change by Monday. As a stretch goal, you can explore a possible UI treatment for how the user might change that value. (e.g., a modal dialog where the user just enters the species name and taxon ID, for starters) We can iterate over this design next.
@Kevboe Merge beta
into your branch if you haven't done so recently; a few fixes have landed there which you may find helpful.
@Kevboe If you do accomplish the stretch goal, this will allow us to verify whether the current grnState
species and taxon are being correctly passed along to the gene page APIs.
@Kevboe For testing, you will want to follow the pattern where:
I was able to accomplish the goal of having a menu on the sidebar that both displays the identified species' Latin name, as well have an indicator of whether the species was able to be identified. Along with the display I added a drop down menu that lets the user select a species if the identified one was incorrect, or just to change it if they want to. The code I wrote works, but I feel that it could be more dynamic than it currently is, and think that I will rewrite it next week. The menu is also not aesthetically pleasing, but I think that this can be solved with some simple CSS changes. The biggest issue that I ran into was merging beta into my working menu branch. I resolved all merge conflicts, but when I went back to test my code I was running into errors launching it. The errors seemed to come from code which I had not changed. Once the errors are resolved, and the visuals are up to standard I will be ready for the pull request.
There is a commit on my menu branch with a working version of the menu pre beta merge, so you can see what work was done, and how it works.
I’m seeing mostly “Module not found” errors there, which leads me to think that a fresh npm install will do the trick. If a plain npm install doesn’t do it, exercise the “nuclear option” and delete node_modules first, and then npm install.
The permission error up top is less clear. I’ll have to think of that one a bit more.
Ran the npm install, here is what I have now. We can talk about this tomorrow.
The follow-up error shown indicates that npm install
did work, but now the issue is an unresolved merge conflict. @Kevboe was given some tips on how to ensure that all merge conflicts are resolved:
git diff
to make sure all conflict markers have been resolved>>>>>
since that occurs pretty much only when git has marked a conflictGoals @Kevboe for the next two weeks:
I have reviewed beta 4.0.12 and have the following comments:
All of the proposed changes to the design have been made. The species menu is now in line with the other menus. It is a collapsible section titled Species and currently only populated with a single species selection drop down menu. The menu allows the user to select the species of the spreadsheet they uploaded. The drop down menu also updates to reflect the species that the program reads in, if it can. I also made the grammatical changes. Finally, I started to add my changes to the GRNstate to the pass through for the gene pages. I have only started, but from what I can tell it looks like I was able to have all the GRNstate data port through, and even thought I see an error in the console, it looks like all of the APIs are correctly queried and return data. All this work is on the same branch that was merged.
User interface looks good upon initial review but we can go deeper after it goes on beta.
Next steps given this progress:
taxon*
parameter names; master version currently uses just taxon
(no suffix)Here is some feedback on the UI currently on beta (v4.0.13):
All suggested changes have been made, the pull requests have been issued and accepted. It was more work than I initially expected to add the mirrored panel to the top, but I think it turned out phenomenal, and is both visually pleasing and very functional. In addition to that I was able to go through my code, and,with some help from @dondi for the regexes, fix the API code. From what I can see I was able to successfully pass in all the variables and all of the APIs are queried properly.
Now that UI is mostly settled (with just one font change request to make it more consistent with other sidebar panels) and the API calls restored, the next immediate step for @Kevboe is to go through a cycle of real-world testing, particularly factoring in that the species might now change.
Logic for switching from YeastMine to other InterMine databases will also need to be done next, with the switch being based on the chosen species. The content in these database is expected to vary depending on the species (see issue #724).
@Kevboe should look up a small number genes from species other than S. cerevisiae and create test workbooks with those genes and test to make sure that calls involving other species now still work. The general theme is overall testing of new functionality to make sure things work.
The error message for missing/incorrect parameters should also be updated to cover the possibility that the wrong species is being queried.
The current error message is this:
No gene information was retrieved for YHP1.
This could have happened because either GRNsight could not access the gene information from one of the source databases or because no information exists for the gene in the source databases.
_You can check back later to see if gene information can be retrieved or submit an issue to https://github.com/dondi/GRNsight.__
Please change it to this:
No gene information was retrieved for YHP1.
This could have happened because:
You can check back later to see if gene information can be retrieved or submit an issue to https://github.com/dondi/GRNsight._
Made the changes that Dr. Dahlquist asked and issued a PR to beta. Also worked on and presented for the URS last week which (I think) went well. For the creation of small species specific networks I ran into issues that I need help with, and was not sure how to move on. When I was creating the networks, and uploading them I was getting errors from, but only sometimes. I couldn't tell if it was a problem with the code, or that the gene that I picked didn't exist in all databases. I believe the problem is with the data, which needs to be verified as existing, but it could be an issue with the code.
Verified that the new error message is displaying in beta 4.0.16
This needs to be tested with all six species that we support.
I.e., you need to create six input workbooks that have what is expected for each species.
Then you need to handle the issue when a user uploads a species that we don't expect.
Cases:
When all sheets are finished and manually verifiable, they can be added to our test set and unit tests can be written around them.
Warnings to user:
"No species information was detected in your input file. GRNsight defaults to Saccharomyces cerevisiae. You can change the species selection in the Species menu or panel."
"GRNsight detected the species
For now, let's not get fancy about detecting species we don't support. The user can figure it out if they made a typo or something.
We need to document on the Documents webpage what we expect the formatting of the species information to be in an uploaded workbook.
These are the species/taxon pairings that need to be tested.
Error message has been made, and the test file have been created. I was able to manually test, bugfix, and verify it works, but did not make tests yet. I have issued a pull request to beta with these changes.
backlogging all gene page issues
This task is being separated from #698. GRNsight needs to read the species name and/or taxon ID from a GRNmap file and load it into GRNstate.
@kdahlquist needs to do a quick test to make sure that this won't mess up GRNmap. Then she will provide a sample file (probably will alter demos 1 and 2.
Note that we need to tackle other issues before moving on to this one.