darogan / labrador

A web based tool to manage and automate the processing of publicly available datasets.
GNU General Public License v3.0
1 stars 0 forks source link

Processing tab #1

Open jcgrenier opened 7 years ago

jcgrenier commented 7 years ago

Hi @darogan,

Maybe we can talk here concerning the new propositions and modifications I want to do in Labrador. Is it normal that the Processing tab is not showing when creating a new project? Will it appear when some data will become available?

Furthermore, how are you proceeding with your Docker to make your jobs processing? Are you installing Clusterflow too?

Thanks a lot! JC

darogan commented 7 years ago

Hi @jcgrenier.

I don't launch processing via Labrador as @ewels was intending to, or has already, removed this functionality from Labrador. So I can't comment on this feature really.

I do the bulk of my processing via ClusterFlow as this fits in well with the HPC set up here in Cambridge.

In my previous job, I did have Docker running a ClusterFlow pipeline (https://bitbucket.org/cegx-bfx/cegx_bsexpress_docker) which worked very well. So wrapping Labrador & ClusterFlow in a Docker container should work...

A bit more background on my set up with labrador I maintain a server hosting the data with a Labrador interface per research group. Processing is initiated manually usually with ClusterFlow I added MultiQC reports, custom PDFs, Excel files support into Labrador to inline display output files from the processing steps.

jcgrenier commented 7 years ago

Super, that seems like what I want to do as well. So, after processing your samples with ClusterFlow, you are adding your information about your samples manually into the interface?

Are you adding those MultiQC reports manually too via the interface or you are proceeding with the MySQL database manually?

Sorry for all those questions, I'm kind of excited about this browser!

Thanks again. JC

ewels commented 7 years ago

+1 for not using Labrador for running jobs. This was the predecessor for Cluster Flow (basically just automating the writing of bash scripts), but it doesn't scale well and got out of hand quite quickly. It also means that you have to be extremely careful with security, as it essentially means that anyone can run anything on your server.

As Russell says, Labrador and CF should play well together though as they were both written by me and intended to be used side by side (neither of these tools were originally intended for public release! 😉 ).

I guess that Labrador should be able to show MultiQC reports easily enough, if contained in a folder that is linked to a project (though I haven't tried it). Any HTML output should be easy enough to display using an iframe.

Phil

jcgrenier commented 7 years ago

Hi @ewels,

I did not intend to use Labrador to run jobs, but maybe more to generate bash scripts to after it run it on a cluster, separated from the server where the database will be hosted.

I'm pretty new using PHP, so I'm still in a learning process regarding all of this. I'm super interested in being able to add tables, pdf/documents and reports manually to the samples and projects thought!

Thanks! JC

darogan commented 7 years ago

MultiQC in Labrador works a charm, and is picked up automatically of the file is in the Data set directory

screen shot 2017-03-16 at 16 09 23

jcgrenier commented 7 years ago

That seems pretty neat!

Is there any particular data structure to respect in order to be able to see those reports? I tried multiple things so far, like creating "overview" subdirectory in my project name, putting directly the multiqc_report.html file, putting a particular report type like "fastqc", but I'm still unable to see anything.

Furthermore, I'm getting a weird error when I'm clicking on the "Files" thumbnail.

"DataTables warning: table id=DataTables_Table_0 - Requested unknown parameter '5' for row 0. For more information about this error, please see http://datatables.net/tn/4"

Thanks! JC

jcgrenier commented 7 years ago

Hi @darogan @ewels ,

I just saw that the report section entirely depend on the name given in the Datasets section. So we apparently can import directly a multiqc report. How can we do this exactly?

Thanks!!! JC

darogan commented 7 years ago

Hi @jcgrenier The actual files aren't entered into the Labrador database (only the metadata you enter is stored). Files located in the data directory are scanned for matching filenames as set in the config files. So as long as your files conform to these conventions, or you define new ones, labrador will find and display the reports in line.

As Labrador doesn't store any information on the files, they can be modified/added/deleted as the analysis progressed and Labrador just displays the latest versions.