Problem uploading samples

churchlab / millstone

Genome engineering and analysis software

http://churchlab.github.io/millstone/

MIT License

47 stars 19 forks source link

Problem uploading samples #701

Open Zymergen-SBRUBAKER opened 4 years ago

Zymergen-SBRUBAKER commented 4 years ago

Hi I have an error when uploading samples through the browser. There does not appear to be any documentation of how to do a manual upload using scp.

glebkuznetsov commented 4 years ago

Hi,

Can you describe the error? A common issue is not providing the correct full path to the files in the upload template. The full path to the location that each fastq has been scp'ed needs to be provided. Can you provide the upload template sheet you are using showing paths? (Feel free to provide a partially anonymized screenshot if you prefer).

Thanks, Gleb

Zymergen-SBRUBAKER commented 4 years ago

Thank you! Here is the screenshot. I am trying to upload a pair of files through the web interface. Doing New, Upload Through Browser... I cannot really see a full path represented through this method and there is not a sample sheet.

Maybe I should try the batch upload? I could also try scp but I don't have the full instructions on how to do it and not sure where the files go.

Thanks for your help!

On Mon, Nov 4, 2019 at 9:48 AM Gleb Kuznetsov notifications@github.com wrote:

Hi,

Can you describe the error? A common issue is not providing the correct full path to the files in the upload template. The full path to the location that each fastq has been scp'ed needs to be provided. Can you provide the upload template sheet you are using showing paths? (Feel free to provide a partially anonymized screenshot if you prefer).

Thanks, Gleb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44UMNBCYMM6XTS63OU3QSBOANA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDAD2AA#issuecomment-549469440, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44SJXRN4IE4NF5M26S3QSBOANANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hello,

Looks like the attachment didn't make it through. Might have to use the Github Issues interface directly for it to go through (rather than email?) If you have any additional details about the failure, please let me know (e.g. is there a delay before the error?)

Sorry we never put up documentation for scp upload. In fact it is the method most users here in the Church Lab use today. Briefly, the steps are something like:

scp files to a location of your choice on the machine running Millstone. E.g., create a data directory /home/ubuntu/raw-data and scp there.
Through the browser Samples page, use New -> From Server Location...

Fill in the template linked in that form with each row representing a sample and the full path to the location.

Unfortunately, we didn't get around to adding scp instruction to the official docs here: https://millstone.readthedocs.io/en/latest/user_guide/projects_alignments.html

In case it's helpful, here's a draft of more complete documentation we started writing but never got around to posting. Feel free to glance in case there is something helpful there. https://docs.google.com/document/d/1tbPiVaaVqECliw5Eu8xBJ8OxpVHynWpo1_kFEFkoJmU/edit?usp=sharing

Thanks! Gleb

Zymergen-SBRUBAKER commented 4 years ago

Hi Gleb! I just wanted to give you an update that I may have made a little more progress.

I tried using the second option, batch upload from a template through browser, and it seemed to get through the first file, but got hung up on the second every time.

I then tried the scp. It had similar issue, where file 1 says copying and file 2 stays in a state of queued to copy.

So I tried a couple of other different files using the batch upload - it seems to have gotten both files.

Now I'm running into some errors on fastqc and alignment step - let me try a couple more things and then I'll let you know how it goes. Thanks for your help!

Shane

On Mon, Nov 4, 2019 at 9:48 AM Gleb Kuznetsov notifications@github.com wrote:

Hi,

Can you describe the error? A common issue is not providing the correct full path to the files in the upload template. The full path to the location that each fastq has been scp'ed needs to be provided. Can you provide the upload template sheet you are using showing paths? (Feel free to provide a partially anonymized screenshot if you prefer).

Thanks, Gleb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44UMNBCYMM6XTS63OU3QSBOANA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDAD2AA#issuecomment-549469440, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44SJXRN4IE4NF5M26S3QSBOANANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hi Shane,

The "queued to copy" state is actually temporary and should resolve with some time. Though usually it's pretty quick.

A couple other issues to check:

what is the ec2 instnce you are using?
how much EBS space is there?

Thanks, Gleb

Zymergen-SBRUBAKER commented 4 years ago

Thanks Gleb!

It looks like it was stuck in queueing. I believe there may have been something weird about those files.

On the other files, they loaded, and then I had to make sure they were really gzipped files, and then I got the fastqc step to work!

That is a good point, I am running on the smallest ec2 instance right now. It does say there is about 1.7GB still free in the upper right corner.

I'll keep an eye on that and I will try the alignment step next. Thanks for all your help so far, I really like Millstone! I will let you know how it goes. :)

On Tue, Nov 5, 2019 at 3:07 PM Gleb Kuznetsov notifications@github.com wrote:

Hi Shane,

The "queued to copy" state is actually temporary and should resolve with some time. Though usually it's pretty quick.

A couple other issues to check:

what is the ec2 instnce you are using?

how much EBS space is there?

Thanks, Gleb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44U7YS6UI5BFGYDY7KTQSH4BJA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDEVA3Q#issuecomment-550064238, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44WALWBUGFQUUG5ASXDQSH4BJANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hey Shane,

Good to hear it's working. Sorry the UI affordances are not perfect. We're expert users here so we hack features as needed :)

As far as instance type, I'd recommend running at least m5.xlarge to make sure you have enough memory for the alignment and subsequent analysis. I'm also concerned you may run out of disk space and would recommend allocating at least 3-5x as much disk as the size of all your FASTQs.

There's a few other tricks we do here to speed things up and make efficient use of AWS. But some of them require manually modifying some of the code and config files and we haven't really documented this anywhere. For example, when aligning > 50 genomes, we'll normally change the instance type to one with many cores (e.g. c5.9xlarge) and tweak the respective config that says to use all the cores. This actually turns out to be cheaper than running a smalerl instance in a more serial model. We'll then change back to a smaller instance type to do analysis / export data.

Anyway, happy to discuss/advise further if you're interested in ramping up your microbial genome alignment/analysis pipelines.

Cheers, Gleb

Zymergen-SBRUBAKER commented 4 years ago

Thanks so much Gleb!

I have now gotten data in there and it runs FastQC successfully.

I then tried to do an alignment, but it seems to find no variants in the table. When I go to the alignment it says no data in table. The job status says completed. I don't know if it's possible there are no variants - do you have a good test dataset that you would recommend?

Or perhaps this is related to the small machine size, I could try that. The job says it completed in 2 minutes which sounds short (this is e coli).

The log seems to go to the end, but does mention something about an output being truncated. I'm pasting the log output below. Let me know what you think!

==START OF ALIGNMENT PIPELINE FOR SAB2, (411a5775) == /home/ubuntu/millstone/genome_designer/conf/../tools/bwa/bwa mem -t 1 -R "@RG\tID:77e5a0a4\tPL:illumina\tPU:77e5a0a4\tLB:77e5a0a4\tSM:77e5a0a4" -a /home/ubuntu/millstone/genome_designer/conf/../temp_data/projects/4064419a/ref_genomes/d89f8976/tmpxft5Bs_MG1655_fasta <(gzip -dc /home/ubuntu/millstone/genome_designer/conf/../temp_data/projects/4064419a/samples/77e5a0a4/s1.fastq.gz) <(gzip -dc /home/ubuntu/millstone/genome_designer/conf/../temp_data/projects/4064419a/samples/77e5a0a4/s2.fastq.gz) | /home/ubuntu/millstone/genome_designer/conf/../tools/samtools/samtools view -bS -[M::main_mem] read 100000 sequences (10000000 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 48443, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (465, 499, 533) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (329, 669) [M::mem_pestat] mean and std.dev: (499.13, 49.98) [M::mem_pestat] low and high boundaries for proper pairs: (261, 737) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::worker2@0] performed mate-SW for 3064 reads [samopen] SAM header is present: 1 sequences. [M::main_mem] read 100000 sequences (10000000 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 48360, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (465, 499, 533) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (329, 669) [M::mem_pestat] mean and std.dev: (498.82, 49.62) [M::mem_pestat] low and high boundaries for proper pairs: (261, 737) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::worker2@0] performed mate-SW for 3100 reads [M::main_mem] read 100000 sequences (10000000 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 48333, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (465, 499, 533) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (329, 669) [M::mem_pestat] mean and std.dev: (499.12, 49.84) [M::mem_pestat] low and high boundaries for proper pairs: (261, 737) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::worker2@0] performed mate-SW for 3324 reads [M::main_mem] read 100000 sequences (10000000 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 48406, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (465, 499, 533) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (329, 669) [M::mem_pestat] mean and std.dev: (498.96, 49.80) [M::mem_pestat] low and high boundaries for proper pairs: (261, 737) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::worker2@0] performed mate-SW for 3006 reads [M::main_mem] read 60000 sequences (6000000 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 29029, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (465, 499, 532) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (331, 666) [M::mem_pestat] mean and std.dev: (498.83, 49.63) [M::mem_pestat] low and high boundaries for proper pairs: (264, 733) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::worker2@0] performed mate-SW for 1808 reads [main] Version: 0.7.5a-r405 [main] CMD: /home/ubuntu/millstone/genome_designer/conf/../tools/bwa/bwa mem -t 1 -R @RG\tID:77e5a0a4\tPL:illumina\tPU:77e5a0a4\tLB:77e5a0a4\tSM:77e5a0a4 -a /home/ubuntu/millstone/genome_designer/conf/../temp_data/projects/4064419a/ref_genomes/d89f8976/tmpxft5Bs_MG1655_fasta /dev/fd/63 /dev/fd/62 [main] Real time: 19.494 sec; CPU: 14.312 sec [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_rmdup_core] processing reference U00096... [bam_rmdup_core] 43 / 230000 = 0.0002 in library '77e5a0a4' Removed 0 outliers with isize >= 849 ==END OF ALIGNMENT PIPELINE==

On Tue, Nov 5, 2019 at 6:54 PM Gleb Kuznetsov notifications@github.com wrote:

Hey Shane,

Good to hear it's working. Sorry the UI affordances are not perfect. We're expert users here so we hack features as needed :)

As far as instance type, I'd recommend running at least m5.xlarge to make sure you have enough memory for the alignment and subsequent analysis. I'm also concerned you may run out of disk space and would recommend allocating at least 3-5x as much disk as the size of all your FASTQs.

There's a few other tricks we do here to speed things up and make efficient use of AWS. But some of them require manually modifying some of the code and config files and we haven't really documented this anywhere. For example, when aligning > 50 genomes, we'll normally change the instance type to one with many cores (e.g. c5.9xlarge) and tweak the respective config that says to use all the cores. This actually turns out to be cheaper than running a smalerl instance in a more serial model. We'll then change back to a smaller instance type to do analysis / export data.

Anyway, happy to discuss/advise further if you're interested in ramping up your microbial genome alignment/analysis pipelines.

Cheers, Gleb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44T4225J5QKVTLOXVH3QSIWV3A5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDFCM7I#issuecomment-550119037, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44QPMELTCXLHQXY54W3QSIWV3ANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hey Shane,

No variants sounds surprising. And I agree 2 min sounds fast for an alignment and variant calling. It's hard to tell from the logs whether anything specifically went wrong. I can't remember whether "truncated input" is bad. It is possible the small machine ran out of memory at some point and a process failed in a way that didn't disrupt the rest of the pipeline.

You can look at our unit tests for some example data. For example if you trace through this test https://github.com/churchlab/millstone/blob/master/genome_designer/pipeline/tests/variant_calling/test_variant_calling.py#L138, you'll see variables pointing to an example genome and fastqs:

        self.KNOWN_SUBSTITUTIONS_ROOT = os.path.join(settings.PWD, 'test_data',
                'test_genome_known_substitutions')

        self.TEST_GENOME_FASTA = os.path.join(self.KNOWN_SUBSTITUTIONS_ROOT,
                'test_genome_known_substitutions.fa')

        self.FAKE_READS_FASTQ1 = os.path.join(self.KNOWN_SUBSTITUTIONS_ROOT,
                'test_genome_known_substitutions_0.snps.simLibrary.1.fq')

        self.FAKE_READS_FASTQ2 = os.path.join(self.KNOWN_SUBSTITUTIONS_ROOT,
                'test_genome_known_substitutions_0.snps.simLibrary.2.fq')

And test_data is located here: https://github.com/churchlab/millstone/tree/master/genome_designer/test_data

Zymergen-SBRUBAKER commented 4 years ago

Awesome, thank you Gleb!

I also looked into the Jbrowse and I saw some coverage plots and other things, so I think it definitely did something. I will try your test data next.

On Wed, Nov 6, 2019 at 2:23 PM Gleb Kuznetsov notifications@github.com wrote:

Hey Shane,

No variants sounds surprising. And I agree 2 min sounds fast for an alignment and variant calling. It's hard to tell from the logs whether anything specifically went wrong. I can't remember whether "truncated input" is bad. It is possible the small machine ran out of memory at some point and a process failed in a way that didn't disrupt the rest of the pipeline.

You can look at our unit tests for some example data. For example if you trace through this test https://github.com/churchlab/millstone/blob/master/genome_designer/pipeline/tests/variant_calling/test_variant_calling.py#L138, you'll see variables pointing to an example genome and fastqs:
    self.KNOWN_SUBSTITUTIONS_ROOT = os.path.join(settings.PWD, 'test_data',
            'test_genome_known_substitutions')

    self.TEST_GENOME_FASTA = os.path.join(self.KNOWN_SUBSTITUTIONS_ROOT,
            'test_genome_known_substitutions.fa')

    self.FAKE_READS_FASTQ1 = os.path.join(self.KNOWN_SUBSTITUTIONS_ROOT,
            'test_genome_known_substitutions_0.snps.simLibrary.1.fq')

    self.FAKE_READS_FASTQ2 = os.path.join(self.KNOWN_SUBSTITUTIONS_ROOT,
            'test_genome_known_substitutions_0.snps.simLibrary.2.fq')
And test_data is located here: https://github.com/churchlab/millstone/tree/master/genome_designer/test_data

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44XJ47LUGHEM5AZF6STQSM7VJA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIGJBQ#issuecomment-550528134, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44QREKJHNMDEDD7TXBDQSM7VJANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

Zymergen-SBRUBAKER commented 4 years ago

Hi Gleb! I tried out the sample data from E coli that you pointed me to and it looks like it mostly worked. Thanks!

I can see SNPs in the tracks in the browser. However, when I click on the Variants tab on the left side, it always just says "No Data in Table". Also, when click on the alignment and "go to variants" I get the error message in the screenshot attached.

Thanks again for your help.

Shane

On Wed, Nov 6, 2019 at 3:43 PM Shane Brubaker sbrubaker@zymergen.com wrote:

Awesome, thank you Gleb!

I also looked into the Jbrowse and I saw some coverage plots and other things, so I think it definitely did something. I will try your test data next.

:)

On Wed, Nov 6, 2019 at 2:23 PM Gleb Kuznetsov notifications@github.com wrote:
Hey Shane,

No variants sounds surprising. And I agree 2 min sounds fast for an alignment and variant calling. It's hard to tell from the logs whether anything specifically went wrong. I can't remember whether "truncated input" is bad. It is possible the small machine ran out of memory at some point and a process failed in a way that didn't disrupt the rest of the pipeline.

You can look at our unit tests for some example data. For example if you trace through this test https://github.com/churchlab/millstone/blob/master/genome_designer/pipeline/tests/variant_calling/test_variant_calling.py#L138, you'll see variables pointing to an example genome and fastqs:
    self.KNOWN_SUBSTITUTIONS_ROOT = os.path.join(settings.PWD, 'test_data',
            'test_genome_known_substitutions')

    self.TEST_GENOME_FASTA = os.path.join(self.KNOWN_SUBSTITUTIONS_ROOT,
            'test_genome_known_substitutions.fa')

    self.FAKE_READS_FASTQ1 = os.path.join(self.KNOWN_SUBSTITUTIONS_ROOT,
            'test_genome_known_substitutions_0.snps.simLibrary.1.fq')

    self.FAKE_READS_FASTQ2 = os.path.join(self.KNOWN_SUBSTITUTIONS_ROOT,
            'test_genome_known_substitutions_0.snps.simLibrary.2.fq')
And test_data is located here: https://github.com/churchlab/millstone/tree/master/genome_designer/test_data

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44XJ47LUGHEM5AZF6STQSM7VJA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIGJBQ#issuecomment-550528134, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44QREKJHNMDEDD7TXBDQSM7VJANCNFSM4JHTQX6Q .
-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hey Shane,

Looks like the attachment didn't make it through to Github. Maybe it's not going through by email? Can you try uploading directly at the issue URL: https://github.com/churchlab/millstone/issues/701?

Thanks, Gleb

Zymergen-SBRUBAKER commented 4 years ago

Here is the screenshot! Screen Shot 2019-11-19 at 3 25 39 PM

Zymergen-SBRUBAKER commented 4 years ago

Thank you Gleb! I have attached the screenshot in the issue. :)

On Tue, Nov 19, 2019 at 8:42 PM Gleb Kuznetsov notifications@github.com wrote:

Hey Shane,

Looks like the attachment didn't make it through to Github. Maybe it's not going through by email? Can you try uploading directly at the issue URL:

701 https://github.com/churchlab/millstone/issues/701?

Thanks, Gleb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44XBBQ6CXEPYU3H6TMLQUS54LA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQV5LY#issuecomment-555835055, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44XDXWF7EHI6XRFCXELQUS54LANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Ah, interesting. It looks like it might be version issue either with Django or Postgres. Though might be the data.

Are you running Millstone on AWS using our pre-built AMI?

Zymergen-SBRUBAKER commented 4 years ago

We are using the pre-built AMI. I believe our IT person had to clone it from your zone and put it onto a new image in our zone - but essentially it should be the AMI.

Thanks for your help! :)

On Fri, Nov 22, 2019 at 11:56 AM Gleb Kuznetsov notifications@github.com wrote:

Ah, interesting. It looks like it might be version issue either with Django or Postgres. Though might be the data.

Are you running Millstone on AWS using our pre-built AMI?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44VYPPVSWRCSRQJORP3QVA2QDA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE6WN2Q#issuecomment-557672170, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44WLBP6FQGTBMDC2453QVA2QDANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hmmm... I tried running the test data myself on a fresh image using our AMI and it seemed to work:

One thing is to confirm the postgres on your Millstone instance is the supported version (9.3):

ubuntu@ip-172-30-0-134:~$ psql --version
psql (PostgreSQL) 9.3.15

Zymergen-SBRUBAKER commented 4 years ago

Thanks Gleb! That does indeed look like the screen that I am not able to see.

My Postgres version is 9.3.15.

So you think maybe it could be something about the version of Django or something like that?

Thanks for your help :)

On Sat, Nov 23, 2019 at 9:44 AM Gleb Kuznetsov notifications@github.com wrote:

Hmmm... I tried running the test data myself on a fresh image using our AMI and it seemed to work:

[image: image] https://user-images.githubusercontent.com/233915/69482898-ff9bce80-0dee-11ea-92b8-93ea547e1a0b.png

One thing is to confirm the postgres on your Millstone instance is the supported version (9.3):

ubuntu@ip-172-30-0-134:~$ psql --version psql (PostgreSQL) 9.3.15

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44VPQ6HA5GLMPJFAY5TQVFTZBA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE72DII#issuecomment-557818273, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44UZTCUONZITUXHRVOLQVFTZBANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hi Shane,

Hmm.. I was more concerned the Postgres version changed (e.g. due to system update), but that appears not be the case.

I suspect what might have happened on your end is that variant calling failed, and our UI is not set up very well to reflect this. Reviewing this thread, I was reminded that you are using "the smallest ec2 instance right now. It does say there is about 1.7GB still free in the upper right corner." So I think what might be happening is alignment is failing due running short on either memory or storage.

We typically use at least an m4.2xlarge (32 GB RAM) and at least 3x as much EBS storage as the FASTQ size (or at least 100 GB). That's what I used for my test yesterday. I recall users trying to use a smaller instance having similar issues.

I think a good bet is to retry on a bigger machine (at least m4.2xlarge) with sufficient storage.

-Gleb

Zymergen-SBRUBAKER commented 4 years ago

Thanks Gleb! It does have some vcf file in the project export by the way.

I was about to try upgrading the size actually anyway, based on your suggestion. My next goal is that I want to try it on some eukaryotic genomes.

So I will work on that and let you know how it goes. Thanks!

On Mon, Nov 25, 2019 at 1:09 PM Gleb Kuznetsov notifications@github.com wrote:

Hi Shane,

Hmm.. I was more concerned the Postgres version changed (e.g. due to system update), but that appears not be the case.

I suspect what might have happened on your end is that variant calling failed, and our UI is not set up very well to reflect this. Reviewing this thread, I was reminded that you are using "the smallest ec2 instance right now. It does say there is about 1.7GB still free in the upper right corner." So I think what might be happening is alignment is failing due running short on either memory or storage.

We typically use at least an m4.2xlarge (32 GB RAM) and at least 3x as much EBS storage as the FASTQ size (or at least 100 GB). That's what I used for my test yesterday. I recall users trying to use a smaller instance having similar issues.

I think a good bet is to retry on a bigger machine (at least m4.2xlarge) with sufficient storage.

-Gleb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44XWCQNCQC6PR6TFJPTQVQ5IXA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFDZXIY#issuecomment-558341027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44TP5RVJ32BFLN3SPLTQVQ5IXANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

Zymergen-SBRUBAKER commented 4 years ago

Hi Gleb, could you describe to me a little more how to add extra drive space to the Millstone instance? I am now using some larger data and so have attached an EBS volume to it. And I then use the Server location upload for the files. However, it seems to still fill up the slash drive. I have tried pointing /tmp at my EBS volume too but that does not seem to work. Thanks!

On Mon, Nov 4, 2019 at 9:48 AM Gleb Kuznetsov notifications@github.com wrote:

Hi,

Can you describe the error? A common issue is not providing the correct full path to the files in the upload template. The full path to the location that each fastq has been scp'ed needs to be provided. Can you provide the upload template sheet you are using showing paths? (Feel free to provide a partially anonymized screenshot if you prefer).

Thanks, Gleb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44UMNBCYMM6XTS63OU3QSBOANA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDAD2AA#issuecomment-549469440, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44SJXRN4IE4NF5M26S3QSBOANANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hi Shane,

This isn't well-documented anywhere and the process is a little messy. A better short-term solution might be to spin up a Millstone instace with a bigger disk.

However, if you'd like to try extending to the bigger EBS, I believe the rough steps are:

Mount the EBS (e.g. to /millstone_data) and make sure you can write to it, e.g. following directions here, and probably have to update write permissions i.e. sudo chown -R ubuntu:ubuntu /millstone_data/
Move the Millstone files from their previous location to the EBS location: mv /home/ubuntu/millstone/genome_designer/temp_data /millstone_data
In ~/millstone/genome_designer/conf/local_settings.py, add/update the param MEDIA_ROOT = '/millstone_data/temp_data'
Fix symlink required by Jbrowse: rm ~/millstone/genome_designer/jbrowse/gd_data ln -s /millstone_data/temp_data ~/millstone/genome_designer/jbrowse/gd_data
Restart Millstone server and related: supervisorctl restart all

I might have messed up a step or two above so give that a try if spinning up a new Millstone instance isn't feasiable.

Zymergen-SBRUBAKER commented 4 years ago

Thanks I will give that a try!

On Tue, Jan 14, 2020 at 10:56 AM Gleb Kuznetsov notifications@github.com wrote:

Hi Shane,

This isn't well-documented anywhere and the process is a little messy. A better short-term solution might be to spin up a Millstone instace with a bigger disk.

However, if you'd like to try extending to the bigger EBS, I believe the rough steps are:

Mount the EBS (e.g. to /millstone_data) and make sure you can write to it, (e.g. following directions [here]( https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html, and probably have to update write permissions i.e. sudo chown -R ubuntu:ubuntu /millstone_data/

Move the Millstone files from their previous location to the EBS location: mv /home/ubuntu/millstone/genome_designer/temp_data /millstone_data

In ~/millstone/genome_designer/conf/local_settings.py, add/update the param MEDIA_ROOT = '/millstone_data/temp_data'

Fix symlink required by Jbrowse: rm ~/millstone/genome_designer/jbrowse/gd_data ln -s /millstone_data/temp_data ~/millstone/genome_designer/jbrowse/gd_data

Restart Millstone server and related: supervisorctl restart all

I might have messed up a step or two above so give that a try if spinning up a new Millstone instance isn't feasiable.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44XJ7XYARHZ2DR4WKPTQ5YDFLA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI5XA4Y#issuecomment-574320755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44VF2VK67SWXE6QD2CTQ5YDFLANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

Zymergen-SBRUBAKER commented 4 years ago

Hi Gleb, I just wanted to let you know that I tried these steps and it resulted in getting "502 bad gateway" when I go into Millstone. I also get "Error abnormal termination" when I do the last restart step.

I will look into getting the bigger volume next and I'll let you know. Thanks.

On Wed, Jan 15, 2020 at 8:56 AM Shane Brubaker sbrubaker@zymergen.com wrote:

Thanks I will give that a try!

On Tue, Jan 14, 2020 at 10:56 AM Gleb Kuznetsov notifications@github.com wrote:

Hi Shane,

This isn't well-documented anywhere and the process is a little messy. A better short-term solution might be to spin up a Millstone instace with a bigger disk.

However, if you'd like to try extending to the bigger EBS, I believe the rough steps are:

Mount the EBS (e.g. to /millstone_data) and make sure you can write to it, (e.g. following directions [here]( https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html, and probably have to update write permissions i.e. sudo chown -R ubuntu:ubuntu /millstone_data/

Move the Millstone files from their previous location to the EBS location: mv /home/ubuntu/millstone/genome_designer/temp_data /millstone_data

In ~/millstone/genome_designer/conf/local_settings.py, add/update the param MEDIA_ROOT = '/millstone_data/temp_data'

Fix symlink required by Jbrowse: rm ~/millstone/genome_designer/jbrowse/gd_data ln -s /millstone_data/temp_data ~/millstone/genome_designer/jbrowse/gd_data

Restart Millstone server and related: supervisorctl restart all

I might have messed up a step or two above so give that a try if spinning up a new Millstone instance isn't feasiable.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44XJ7XYARHZ2DR4WKPTQ5YDFLA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI5XA4Y#issuecomment-574320755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44VF2VK67SWXE6QD2CTQ5YDFLANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

Zymergen-SBRUBAKER commented 4 years ago

Great, so I was able to expand the root partition. I then "undid" all the changes you had above. It appears to have restarted successfully. Thanks!!

On Wed, Jan 15, 2020 at 8:56 AM Shane Brubaker sbrubaker@zymergen.com wrote:

Thanks I will give that a try!

On Tue, Jan 14, 2020 at 10:56 AM Gleb Kuznetsov notifications@github.com wrote:

Hi Shane,

This isn't well-documented anywhere and the process is a little messy. A better short-term solution might be to spin up a Millstone instace with a bigger disk.

However, if you'd like to try extending to the bigger EBS, I believe the rough steps are:

Mount the EBS (e.g. to /millstone_data) and make sure you can write to it, (e.g. following directions [here]( https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html, and probably have to update write permissions i.e. sudo chown -R ubuntu:ubuntu /millstone_data/

Move the Millstone files from their previous location to the EBS location: mv /home/ubuntu/millstone/genome_designer/temp_data /millstone_data

In ~/millstone/genome_designer/conf/local_settings.py, add/update the param MEDIA_ROOT = '/millstone_data/temp_data'

Fix symlink required by Jbrowse: rm ~/millstone/genome_designer/jbrowse/gd_data ln -s /millstone_data/temp_data ~/millstone/genome_designer/jbrowse/gd_data

Restart Millstone server and related: supervisorctl restart all

I might have messed up a step or two above so give that a try if spinning up a new Millstone instance isn't feasiable.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44XJ7XYARHZ2DR4WKPTQ5YDFLA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI5XA4Y#issuecomment-574320755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44VF2VK67SWXE6QD2CTQ5YDFLANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Ah, thanks for reminding me that it's now possible to expand the root partition. When we built Millstone it was not possible to do that on AWS!

On Wed, Jan 15, 2020 at 5:56 PM Zymergen-SBRUBAKER notifications@github.com wrote:

Great, so I was able to expand the root partition. I then "undid" all the changes you had above. It appears to have restarted successfully. Thanks!!

On Wed, Jan 15, 2020 at 8:56 AM Shane Brubaker sbrubaker@zymergen.com wrote:

Thanks I will give that a try!

On Tue, Jan 14, 2020 at 10:56 AM Gleb Kuznetsov < notifications@github.com> wrote:

Hi Shane,

This isn't well-documented anywhere and the process is a little messy. A better short-term solution might be to spin up a Millstone instace with a bigger disk.

However, if you'd like to try extending to the bigger EBS, I believe the rough steps are:

Mount the EBS (e.g. to /millstone_data) and make sure you can write to it, (e.g. following directions [here](

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html , and probably have to update write permissions i.e. sudo chown -R ubuntu:ubuntu /millstone_data/

Move the Millstone files from their previous location to the EBS location: mv /home/ubuntu/millstone/genome_designer/temp_data /millstone_data

In ~/millstone/genome_designer/conf/local_settings.py, add/update the param MEDIA_ROOT = '/millstone_data/temp_data'

Fix symlink required by Jbrowse: rm ~/millstone/genome_designer/jbrowse/gd_data ln -s /millstone_data/temp_data ~/millstone/genome_designer/jbrowse/gd_data

Restart Millstone server and related: supervisorctl restart all

I might have messed up a step or two above so give that a try if spinning up a new Millstone instance isn't feasiable.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44XJ7XYARHZ2DR4WKPTQ5YDFLA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI5XA4Y#issuecomment-574320755 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AM2Y44VF2VK67SWXE6QD2CTQ5YDFLANCNFSM4JHTQX6Q

.

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AABZDO24MDBDPU7X4Y4UM4TQ56ICVA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJCDNNA#issuecomment-574895796, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZDO2PWZDIQIQREXLH7FTQ56ICVANCNFSM4JHTQX6Q .

Zymergen-SBRUBAKER commented 4 years ago

Hi Gleb! I got pretty far, I got the alignment to run. But now Jbrowse is not working. It takes me to the screen shot shown here attached.

I suspect this is related to me deleting the gd_data folder from before. I tried putting things back by "undoing" the steps, so I created a gd_data folder in the right place, not a symlink. But I don't think that work. Do you have any idea if I could fix this? Thanks.

On Wed, Jan 15, 2020 at 3:12 PM Gleb Kuznetsov notifications@github.com wrote:

Ah, thanks for reminding me that it's now possible to expand the root partition. When we built Millstone it was not possible to do that on AWS!

On Wed, Jan 15, 2020 at 5:56 PM Zymergen-SBRUBAKER < notifications@github.com> wrote:

Great, so I was able to expand the root partition. I then "undid" all the changes you had above. It appears to have restarted successfully. Thanks!!

On Wed, Jan 15, 2020 at 8:56 AM Shane Brubaker sbrubaker@zymergen.com wrote:

Thanks I will give that a try!

On Tue, Jan 14, 2020 at 10:56 AM Gleb Kuznetsov < notifications@github.com> wrote:

Hi Shane,

This isn't well-documented anywhere and the process is a little messy. A better short-term solution might be to spin up a Millstone instace with a bigger disk.

However, if you'd like to try extending to the bigger EBS, I believe the rough steps are:

Mount the EBS (e.g. to /millstone_data) and make sure you can write to it, (e.g. following directions [here](

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html ,

and probably have to update write permissions i.e. sudo chown -R ubuntu:ubuntu /millstone_data/

Move the Millstone files from their previous location to the EBS location: mv /home/ubuntu/millstone/genome_designer/temp_data /millstone_data

In ~/millstone/genome_designer/conf/local_settings.py, add/update the param MEDIA_ROOT = '/millstone_data/temp_data'

Fix symlink required by Jbrowse: rm ~/millstone/genome_designer/jbrowse/gd_data ln -s /millstone_data/temp_data ~/millstone/genome_designer/jbrowse/gd_data

Restart Millstone server and related: supervisorctl restart all

I might have messed up a step or two above so give that a try if spinning up a new Millstone instance isn't feasiable.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44XJ7XYARHZ2DR4WKPTQ5YDFLA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI5XA4Y#issuecomment-574320755

,

or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AM2Y44VF2VK67SWXE6QD2CTQ5YDFLANCNFSM4JHTQX6Q

.

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AABZDO24MDBDPU7X4Y4UM4TQ56ICVA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJCDNNA#issuecomment-574895796 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AABZDO2PWZDIQIQREXLH7FTQ56ICVANCNFSM4JHTQX6Q

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44TOT2KOW54KPFI4FF3Q56J7BA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJCETZY#issuecomment-574900711, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44QCITGYH6VMOHJKPALQ56J7BANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hi Shane,

Indeed gd_data needs to be a symlink to the location where Millstone actually stores files, millstone/genome_designer/temp_data. JBrowse just displays actual bam files.

Should be able to fix this by removing the gd_data folder and setting the symlink:

ln -s /home/ubuntu/millstone/genome_designer/temp_data /home/ubuntu/millstone/genome_designer/jbrowse/gd_data

Zymergen-SBRUBAKER commented 4 years ago

Awesome, that worked! Thank you so much Gleb!

Shane

On Fri, Jan 17, 2020 at 12:50 PM Gleb Kuznetsov notifications@github.com wrote:

Hi Shane,

Indeed gd_data needs to be a symlink to the location where Millstone actually stores files, millstone/genome_designer/temp_data. JBrowse just displays actual bam files.

Should be able to fix this by removing the gd_data folder and setting the symlink:

ln -s /home/ubuntu/millstone/genome_designer/temp_data /home/ubuntu/millstone/genome_designer/jbrowse/gd_data

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44TTT4KDY44MQ4H5RELQ6IKYBA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJI5XSI#issuecomment-575790025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44TO2MDCDKYMZPZFMM3Q6IKYBANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

Zymergen-SBRUBAKER commented 4 years ago

Hi Gleb, I had another question for you. My Export on my project is hanging/failing. Do you know why that might be? It has about 3 samples and 3 alignments in it.

On Fri, Jan 17, 2020 at 12:50 PM Gleb Kuznetsov notifications@github.com wrote:

Hi Shane,

Indeed gd_data needs to be a symlink to the location where Millstone actually stores files, millstone/genome_designer/temp_data. JBrowse just displays actual bam files.

Should be able to fix this by removing the gd_data folder and setting the symlink:

ln -s /home/ubuntu/millstone/genome_designer/temp_data /home/ubuntu/millstone/genome_designer/jbrowse/gd_data

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44TTT4KDY44MQ4H5RELQ6IKYBA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJI5XSI#issuecomment-575790025, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44TO2MDCDKYMZPZFMM3Q6IKYBANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hi Shane,

That's surprising with so few samples, but hard for me to debug. Which strategy for export are you using?

-Gleb

Zymergen-SBRUBAKER commented 4 years ago

I am just using the Export from the main Project screen. Is there another method?

this is for e coli - I have successfully exported one before using some of the test data

could the size of the server matter?

On Mon, Jan 27, 2020 at 4:00 PM Gleb Kuznetsov notifications@github.com wrote:

Hi Shane,

That's surprising with so few samples, but hard for me to debug. Which strategy for export are you using?

-Gleb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44WVV7J5D5ZNUT75VVTQ75YSTA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKBQ7LY#issuecomment-579014575, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44VQEM6RVK4ISBXW5PLQ75YSTANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hi Shane,

Got it. Indeed there might be some issue with exporting the entire project due to the instance size. It's a not a feature we optimized.

However, what most users actually want to export is a .csv of all the called variants and metadata. That should work for your project, and just have to do it from the Analyze view. For example, in the public demo we host, you'd do it from this page: http://ec2-52-4-236-89.compute-1.amazonaws.com/projects/b4cbc454/analyze/e0f7b0c1/variants?filter=&melt=0#

Click the top checkbox that selects all.
A blue notification appears informing you that only the 100 are selected, but you probably want to press 'Select all results that match this filter.'
In the dropdown, select 'Export as csv', as shown in the screenshot below:

The rest of the data (.fastq files, generated .bam files, etc.) are located in the millstone filesystem (temp_data folder as discussed above) so you can browse / scp what you need there, though folder names are by software-generated uids, so might need extra detective work, or using the django shell to query the database for that.

Let me know if that's what you were looking for.

Thanks! Gleb

Zymergen-SBRUBAKER commented 4 years ago

Hi Gleb! I wanted to let you know that I got my project export working - it was running out of disk space. It did create a rather large file, ~50GB.

I am getting a failure when running SV. Can you tell me how I would find the logs with the error?

Thanks, Shane

On Tue, Jan 28, 2020 at 10:10 AM Gleb Kuznetsov notifications@github.com wrote:

Hi Shane,

Got it. Indeed there might be some issue with exporting the entire project due to the instance size. It's a not a feature we optimized.

However, what most users actually want to export is a .csv of all the called variants and metadata. That should work for your project, and just have to do it from the Analyze view. For example, in the public demo we host, you'd do it from this page:

http://ec2-52-4-236-89.compute-1.amazonaws.com/projects/b4cbc454/analyze/e0f7b0c1/variants?filter=&melt=0#

Click the top checkbox that selects all.

A blue notification appears informing you that only the 100 are selected, but you probably want to press 'Select all results that match this filter.'

In the dropdown, select 'Export as csv', as shown in the screenshot below:

[image: image] https://user-images.githubusercontent.com/233915/73292000-459a5780-41cf-11ea-9e47-db37f1e8fd48.png

The rest of the data (.fastq files, generated .bam files, etc.) are located in the millstone filesystem (temp_data folder as discussed above) so you can browse / scp what you need there, though folder names are by software-generated uids, so might need extra detective work, or using the django shell to query the database for that.

Let me know if that's what you were looking for.

Thanks! Gleb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44QBVMITMCQXHAJIEBTRABYILA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKEKUIA#issuecomment-579381792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44SC6F2YSEZBPIOFWE3RABYILANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

glebkuznetsov commented 4 years ago

Hi Shane,

Great to hear.

All of the Millstone related logs get written to one of the files in /var/log/supervisor. Specifically either in millstone-stdout.log or celery-stdout.log.

-Gleb

Zymergen-SBRUBAKER commented 4 years ago

Great I will take a look, thanks Gleb!

On Sun, Feb 9, 2020 at 9:40 AM Gleb Kuznetsov notifications@github.com wrote:

Hi Shane,

Great to hear.

All of the Millstone related logs get written to one of the files in /var/log/supervisor. Specifically either in millstone-stdout.log or celery-stdout.log.

-Gleb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/churchlab/millstone/issues/701?email_source=notifications&email_token=AM2Y44TFIBN4AKYCILPAQM3RCA52VA5CNFSM4JHTQX62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELGS6NA#issuecomment-583872308, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44WREGFHEXXN53X7OADRCA52VANCNFSM4JHTQX6Q .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com