Closed mrship closed 11 years ago
Also, for predecessor #to_gff I'm seeing empty strings returned - is that to be expected?
@mrship They should all have reference ids,
For each feature a reference is found first. https://gist.github.com/danmaclean/5437571 This could return nil (perhaps if @experiment.genome_id is a string - playing on the console showed me I needed to use an int for reference_id in a Feature.find(:all).select {|f| f.reference_id == 1}
)
This is added as an attribute and saved back, so presuming it found the proper reference then we should be ok.
Reasons it might not have found the proper reference,
This gist shows the part of experiments_controller.rb
responsible for reading the uploaded gff file https://gist.github.com/danmaclean/5437636 it isn't removing trailing newlines (Bio::Record::GFF::GFF3 should do this, but it isn't skipping newlines and perhaps everything is going through silently.
Looks like this last one is a good bet, this code gives a completely empty string object, rather than just die-ing
g = Bio::GFF::GFF3.new("\n")
=> ##gff-version 3
. . . . . . . . . .
Im supposing that this will boil down to when we add data, rather than retrieval of data by Rails itself!
Predecessor to_gff shouldn't return empty strings, but I bet this is a follow on from the empty gff. Do any of your gffs have empty lines at the bottom?
@mrship Actually in the loading of the GFF each line
from the gff file should be a Bio::GFF::GFF3::Record
not a Bio::GFF::GFF3
, these two are similar and give nearly the same results in this test, but may be messing up elsewhere
> g = Bio::GFF::GFF3::Record.new("\n")
=> . . . . . . . . .
OK, I'm going to look at wrapping some tests around the code so we can get to the bottom of it. It may be due to my testing with rogue data but until we can definitively (and easily) test the code it will difficult to see exactly where the problems lie. I'll crack on with that tomorrow.
OK, having had a head-scratching morning as I work through how Features are created, I have reverted to a simple test for a Feature where I import the test FNA and GFF and look at the output from #to_gff.
Under the old method, I get:
Chr1 TAIR9 three_prime_UTR 11649 11863 . - . Parent=AT1G01030.1
Under your revised method, I get:
Chr1 TAIR9 three_prime_UTR 11649 11863 0.0 - 0 Parent=AT1G01030.1;gfu_id=60
I'll continue to try and wrap some tests around the logic and wrap my head around it too!
GFF is a swear word in bioinformatics sometimes...
OK, I'll leave well alone for now then :smile:
I've got a working version of some very simple specs that have helped me to determine the problems with the rake repo:export
task in outputting the GFF. I'll create a PR that reflects those changes.
@danmaclean In exporting the feature dataset with #to_gff I've come across Features that don't have a reference_id. That causes the code to blow up. Any thoughts as to why some features don't have reference_ids?
I'm wary of digging into this code too much as I don't really know what it does and there are no tests for it, so if we can work out a quick fix I can release a feature to write GFF data and complete the repository dump.
Let me know.