MariaNattestad / Assemblytics

Assemblytics is a bioinformatics tool to detect and analyze structural variants from a genome assembly by comparing it to a reference genome.
http://assemblytics.com
MIT License
135 stars 28 forks source link

differences between the web and CLI version of the engine #21

Closed 0xaf1f closed 4 years ago

0xaf1f commented 4 years ago

Comparing the scripts in this repository and Assemblytics_web, there are some differences between the two and I'm not sure why:

diff -ur Assemblytics/Assemblytics_index.py Assemblytics_web/bin/Assemblytics_index.py
--- Assemblytics/Assemblytics_index.py  2019-10-02 15:31:54.203075256 -0700
+++ Assemblytics_web/bin/Assemblytics_index.py  2019-10-02 15:32:15.716400064 -0700
@@ -130,6 +130,7 @@
     fout.write(header+",alignment_length\n") # copy the header

     alignment_length_column = len(header.split(","))
+
     # sorted_by_alignment_length = []
     uniques = []
     repetitives = []
diff -ur Assemblytics/Assemblytics_Nchart.R Assemblytics_web/bin/Assemblytics_Nchart.R
--- Assemblytics/Assemblytics_Nchart.R  2019-10-02 15:31:54.174076167 -0700
+++ Assemblytics_web/bin/Assemblytics_Nchart.R  2019-10-02 15:32:15.712400189 -0700
@@ -80,7 +80,7 @@
                 geom_point(data=both.plot,size=2,alpha=0.5) + 
                 labs(x = paste("NG(x)% where 100% = ",bp_format(genome.length), sep=""),y="Sequence length",colour="Assembly",title="Cumulative sequence length") +
                 scale_color_manual(values=colors) +
-                annotation_logticks(sides="lr",color="black")
+                annotation_logticks(sides="lr")
               )
     } else {
         # To make bacterial genomes at least show a dot instead of an error because  
diff -ur Assemblytics/Assemblytics_uniq_anchor.py Assemblytics_web/bin/Assemblytics_uniq_anchor.py
--- Assemblytics/Assemblytics_uniq_anchor.py    2019-10-02 15:31:54.223074629 -0700
+++ Assemblytics_web/bin/Assemblytics_uniq_anchor.py    2019-10-02 15:32:15.720399938 -0700
@@ -34,7 +34,7 @@

     f = open(filename)
     header1 = f.readline()
-    if header1[0:4]=="\x1f\x8b\x08\x08":
+    if header1[0:2]=="\x1f\x8b":
         f.close()
         f = gzip.open(filename)
         print f.readline().strip()
@@ -146,7 +146,7 @@

     f = open(filename)
     header1 = f.readline()
-    if header1[0:4]=="\x1f\x8b\x08\x08":
+    if header1[0:2]=="\x1f\x8b":
         f.close()
         f = gzip.open(filename)
         header1 = f.readline()
diff -ur Assemblytics/Assemblytics_variant_charts.R Assemblytics_web/bin/Assemblytics_variant_charts.R
--- Assemblytics/Assemblytics_variant_charts.R  2019-10-02 15:31:54.232074346 -0700
+++ Assemblytics_web/bin/Assemblytics_variant_charts.R  2019-10-02 15:32:15.721399907 -0700
@@ -99,7 +99,7 @@
                       scale_fill_manual(values=big_palette,drop=FALSE) + 
                       facet_grid(type ~ .,drop=FALSE) + 
                       labs(fill="Variant type",x="Variant size",y="Count",title=paste("Variants",comma_format(min_var),"to", comma_format(max_var),"bp")) + 
-                              scale_x_continuous(labels=comma_format,expand=c(0,0),limits=c(min_var,max_var)) + 
+                              scale_x_continuous(labels=comma_format,expand=c(0,0),limits=c(min_var-1,max_var)) + 
                               scale_y_continuous(labels=comma_format,expand=c(0,0)) +
                       theme(
                           strip.text=element_blank(),strip.background=element_blank(),
@@ -130,7 +130,7 @@
         }

         print(ggplot(alt,aes(x=size, fill=type,y=..count..+1)) + 
-            geom_histogram(binwidth=abs_max_var/100) + 
+            geom_histogram(binwidth=abs_max_var/100, position="identity",alpha=0.7) + 
             scale_fill_manual(values=big_palette,drop=FALSE) + 
             facet_grid(Type ~ .,drop=FALSE) + 
             labs(fill="Variant type",x="Variant size",y="Log(count + 1)",title=paste("Variants",comma_format(abs_min_var),"to", comma_format(abs_max_var),"bp")) + 
diff -ur Assemblytics/Assemblytics_within_alignment.py Assemblytics_web/bin/Assemblytics_within_alignment.py
--- Assemblytics/Assemblytics_within_alignment.py   2019-10-02 15:31:54.242074032 -0700
+++ Assemblytics_web/bin/Assemblytics_within_alignment.py   2019-10-02 15:32:15.722399876 -0700
@@ -13,7 +13,7 @@

     f = open(filename)
     header1 = f.readline()
-    if header1[0:4]=="\x1f\x8b\x08\x08":
+    if header1[0:2]=="\x1f\x8b":
         f.close()
         f = gzip.open(filename)
         header1 = f.readline()

Are any of these differences consequential?

In the main script, there are many unnecessary differences, like redirection to $LOG_FILE in the web version vs just printing to the screen in the CLI version, and setting default parameters vs not. These can possibly be taken care of with a couple conditionals at the top so that the same script can be used for both instances. It'd be very beneficial and much less confusing to have just one authoritative copy of the codebase used by both versions.

MariaNattestad commented 4 years ago

I'm working on consolidating the repos now. I don't remember anymore why I split Assemblytics into two repositories, but I am now older and smarter. Sorry for the delay answering this, but I am now working on it and will have an update consolidating them (plus python3) out this week. Stay tuned!

MariaNattestad commented 4 years ago

All done! Changes are in master, and I cut a new release called 1.1. I also archived the assemblytics_web repo, and linked it back to this one, so now we have one central location for it.

0xaf1f commented 4 years ago

Many thanks! I submitted a package to bioconda so we can just use it through there directly.