Closed nathandunn closed 8 years ago
@nathandunn Yes, even I noticed the slow download. Perhaps we can review the code tomorrow and see if it can be sped up.
Yeah, I think that is a good idea.
Takes about 1 minute to extricate 9 annotations . . . I made a small change, but the real culprit is simply doing a recursive SQL query:
convertToEntry(...){
// ...
gffEntries.add(entry);
for (Feature child : featureRelationshipService.getChildren(feature)) {
if (child instanceof CDS) {
convertToEntry(writeObject, (CDS) child, source, gffEntries);
} else {
convertToEntry(writeObject, child, source, gffEntries);
}
}
//..
}
This is partly because we look for children when the feature type makes no sense (e.g., CDS/Exon, etc.), which is a lot of our features. Should only look in the cases where we have a code-able transcript or gene.
This is going to need to be a 1-2 day refactor . . with each Query as a HashMap<String,Feature> where String is a lookup string:
so this is 4 queries X # sequences regardless of the # of annotations
Fetches should go into "view" objects instead of domain objects.
For each "parent" we build up a structure of children.
It could be that the keys are just List<> and we use a nice "uniquename" identifier to faciliate a quick and proper lookup based on the uniquename only.
After further testing, I don't think that this was a bad as I thought.
@deepakunni3 Just an FYI.
After further testing, I think the problem is that H2 (what the dev environment uses) is REALLY slow for this type of operation. However, it seems to work great against PostgreSQL.
Sorry, didn't mean to close. It can still be optimized.
Just updating this again, but GFF3 output is still really slow.
Here are several runs on the Pythium ultimum data (341 features)
real 3m51.849s
real 4m28.702s
The procedure to calculate the phase at the gff3 service level is particularly expensive
that's pretty slow and those should be easy fixes (all / every is slow)
As mentioned at meeting, this is still pretty slow. It's about 200% faster since my last report due to an optimization to the FeatureRelationshipService::getParentForFeature, so it is taking about 2 minutes instead of the old 4 minutes on the 341 genes in pythium ultimum sample data.
Some real quick notes:
To export 18 features takes 1.24 seconds, which is not bad. As @cmdcolin noted earlier, it does a lot of individual and un-necessary feature queries:
Should note . . these are select X from feature where feature.id = ?
For more features (1800) we have this:
Which of course gets exacerbated:
However, digging a bit deeper, looks like the slowdown is in individual queries . . e.g. getComments does a request on a single feature for a single feature property.
I think that the solution is in "extractAttributes" to pull out a Map's of features and their comments and then pull off of those individually, similar to what is done in TranscriptService. This way its just a single query to populate "comments".
This is a random note but fasta needs optimization too. it is slow for different reasons if I recall, namely, reading sequences into and out of database
Thanks. Could you open a different issue for this with any details?
Nathan
On Feb 3, 2016, at 7:47 AM, Colin Diesh notifications@github.com wrote:
This is a random note but fasta needs optimization too. it is slow for different reasons if I recall, namely, reading sequences into and out of database
— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/274#issuecomment-179302227.
:dancer: it is faster.
Fixed this, but then I got another error, not being able to do right-clicks . . so I didn't check this fix in.
Cannot invoke method replaceAll() on null object. Stacktrace follows: java.lang.NullPointerException: Cannot invoke method replaceAll() on null object at org.bbop.apollo.Gff3HandlerService.encodeString(Gff3HandlerService.groovy:343) at org.bbop.apollo.Gff3HandlerService.extractAttributes(Gff3HandlerService.groovy:321) at org.bbop.apollo.Gff3HandlerService.convertToEntry(Gff3HandlerService.groovy:179) at org.bbop.apollo.Gff3HandlerService.convertToEntry(Gff3HandlerService.groovy:185) at org.bbop.apollo.Gff3HandlerService.convertToEntry(Gff3HandlerService.groovy:158) at org.bbop.apollo.Gff3HandlerService.writeFeature(Gff3HandlerService.groovy:114) at org.bbop.apollo.Gff3HandlerService.writeFeatures(Gff3HandlerService.groovy:80) at org.bbop.apollo.Gff3HandlerService.writeFeaturesToText(Gff3HandlerService.groovy:60) at org.bbop.apollo.SequenceController.$tt__exportSequences(SequenceController.groovy:116) at grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:198) at grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63)