biofuture / Ublastx_stageone

This is the source code for multisamples ARGs profiling using SARG2.0 database
45 stars 42 forks source link

stage two #7

Closed maynemarie closed 5 years ago

maynemarie commented 6 years ago

Hi, My extracted.faa file from step one was too big (1.6GB) and i split the file into 500MB batches (4 separate faa files). I uploaded them as a batch using the multiple dataset function in the source fasta file and used the same meta file generated from stageone process - However i keep getting an error message.

biofuture commented 6 years ago

Hi, you need to separate the extracted.fa file into different small files by samples, rather than the file size.

maynemarie commented 6 years ago

Hi, I've split my extracted.faa file into smaller files organized by samples however when I try to upload them as a batch, only one .faa file appears for selection and not all the 10 others. Would it be possible to provide a code for us to run steptwo locally?

biofuture commented 6 years ago

Dear Maynemarie

You need to upload them one by one, indeed you can upload them simultaneously in different windows for a batch running.

Sorry that at the current stage, only online version is available, the local version may be available in the future. You can upload them to a mirror site

http://smile.sustc.edu.cn:8080/

maynemarie commented 6 years ago

Hi Xiaotao,

I am still having difficulties with step 2. The some of the split fasta files (according to samples) exceed 2GB and I'm getting an error message even with smaller sized files 1GB. What is the easiest way to rectify this or find out what's causing the error - could you not provide the code? Thanks.

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) notifications@github.com wrote:

Dear Maynemarie

You need to upload them one by one, indeed you can upload them simultaneously in different windows for a batch running.

Sorry that at the current stage, only online version is available, the local version may be available in the future. You can upload them to a mirror site

http://smile.sustc.edu.cn:8080/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issuecomment-354699850, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9MmT79jyauww33DmpBAuBRh8lHunks5tGafZgaJpZM4Q6qDe .

maynemarie commented 6 years ago

If we split the fasta file, do we also have to split the metadata file to match? Thanks

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) notifications@github.com wrote:

Dear Maynemarie

You need to upload them one by one, indeed you can upload them simultaneously in different windows for a batch running.

Sorry that at the current stage, only online version is available, the local version may be available in the future. You can upload them to a mirror site

http://smile.sustc.edu.cn:8080/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issuecomment-354699850, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9MmT79jyauww33DmpBAuBRh8lHunks5tGafZgaJpZM4Q6qDe .

biofuture commented 6 years ago

Yes, you need to match them. I guess the two files are not match well.

发自我的 iPhone

在 2018年1月2日,下午8:01,maynemarie notifications@github.com 写道:

If we split the fasta file, do we also have to split the metadata file to match? Thanks

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) notifications@github.com wrote:

Dear Maynemarie

You need to upload them one by one, indeed you can upload them simultaneously in different windows for a batch running.

Sorry that at the current stage, only online version is available, the local version may be available in the future. You can upload them to a mirror site

http://smile.sustc.edu.cn:8080/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issuecomment-354699850, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9MmT79jyauww33DmpBAuBRh8lHunks5tGafZgaJpZM4Q6qDe .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

maynemarie commented 6 years ago

Would this affect the sample comparison such as the PCO output later given that they fasta files were run separately with matching metafiles

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) notifications@github.com wrote:

Yes, you need to match them. I guess the two files are not match well.

发自我的 iPhone

在 2018年1月2日,下午8:01,maynemarie notifications@github.com 写道:

If we split the fasta file, do we also have to split the metadata file to match? Thanks

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com> wrote:

Dear Maynemarie

You need to upload them one by one, indeed you can upload them simultaneously in different windows for a batch running.

Sorry that at the current stage, only online version is available, the local version may be available in the future. You can upload them to a mirror site

http://smile.sustc.edu.cn:8080/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7# issuecomment-354699850, or mute the thread https://github.com/notifications/unsubscribe-auth/ Agxb9MmT79jyauww33DmpBAuBRh8lHunks5tGafZgaJpZM4Q6qDe .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issuecomment-354755033, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9GPq827GRqqNngRWd7wdQCJMLA-dks5tGh4KgaJpZM4Q6qDe .

maynemarie commented 6 years ago

Also, what do I do with the files of samples that are 2GB and return with an error

On Tuesday, January 2, 2018, Charmaine Ng ng.charmainemarie@gmail.com wrote:

Would this affect the sample comparison such as the PCO output later given that they fasta files were run separately with matching metafiles

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) notifications@github.com wrote:

Yes, you need to match them. I guess the two files are not match well.

发自我的 iPhone

在 2018年1月2日,下午8:01,maynemarie notifications@github.com 写道:

If we split the fasta file, do we also have to split the metadata file to match? Thanks

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com> wrote:

Dear Maynemarie

You need to upload them one by one, indeed you can upload them simultaneously in different windows for a batch running.

Sorry that at the current stage, only online version is available, the local version may be available in the future. You can upload them to a mirror site

http://smile.sustc.edu.cn:8080/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issu ecomment-354699850, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9MmT7 9jyauww33DmpBAuBRh8lHunks5tGafZgaJpZM4Q6qDe .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issuecomment-354755033, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9GPq827GRqqNngRWd7wdQCJMLA-dks5tGh4KgaJpZM4Q6qDe .

biofuture commented 6 years ago

You can generate pcoa by merge all results. Wonnot affect it

maynemarie notifications@github.com于2018年1月2日 周二下午8:33写道:

Also, what do I do with the files of samples that are 2GB and return with an error

On Tuesday, January 2, 2018, Charmaine Ng ng.charmainemarie@gmail.com wrote:

Would this affect the sample comparison such as the PCO output later given that they fasta files were run separately with matching metafiles

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com> wrote:

Yes, you need to match them. I guess the two files are not match well.

发自我的 iPhone

在 2018年1月2日,下午8:01,maynemarie notifications@github.com 写道:

If we split the fasta file, do we also have to split the metadata file to match? Thanks

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com> wrote:

Dear Maynemarie

You need to upload them one by one, indeed you can upload them simultaneously in different windows for a batch running.

Sorry that at the current stage, only online version is available, the local version may be available in the future. You can upload them to a mirror site

http://smile.sustc.edu.cn:8080/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issu ecomment-354699850, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9MmT7 9jyauww33DmpBAuBRh8lHunks5tGafZgaJpZM4Q6qDe .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/biofuture/Ublastx_stageone/issues/7#issuecomment-354755033 , or mute the thread < https://github.com/notifications/unsubscribe-auth/Agxb9GPq827GRqqNngRWd7wdQCJMLA-dks5tGh4KgaJpZM4Q6qDe

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issuecomment-354757483, or mute the thread https://github.com/notifications/unsubscribe-auth/ABaxoDIcjMyVI211_6TtV-_OeY2a5TbFks5tGiH8gaJpZM4Q6qDe .

-- Regards, Dr. Xiao-Tao Jiang

Department of Civil Engineering The University of HongKong

maynemarie commented 6 years ago

For files of samples which are more than 2GB. Do I split them and can I merge the same sample files together after?

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) notifications@github.com wrote:

You can generate pcoa by merge all results. Wonnot affect it

maynemarie notifications@github.com于2018年1月2日 周二下午8:33写道:

Also, what do I do with the files of samples that are 2GB and return with an error

On Tuesday, January 2, 2018, Charmaine Ng ng.charmainemarie@gmail.com wrote:

Would this affect the sample comparison such as the PCO output later given that they fasta files were run separately with matching metafiles

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com> wrote:

Yes, you need to match them. I guess the two files are not match well.

发自我的 iPhone

在 2018年1月2日,下午8:01,maynemarie notifications@github.com 写道:

If we split the fasta file, do we also have to split the metadata file to match? Thanks

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com> wrote:

Dear Maynemarie

You need to upload them one by one, indeed you can upload them simultaneously in different windows for a batch running.

Sorry that at the current stage, only online version is available, the local version may be available in the future. You can upload them to a mirror site

http://smile.sustc.edu.cn:8080/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issu ecomment-354699850, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9MmT7 9jyauww33DmpBAuBRh8lHunks5tGafZgaJpZM4Q6qDe .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/biofuture/Ublastx_stageone/issues/7# issuecomment-354755033 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ Agxb9GPq827GRqqNngRWd7wdQCJMLA-dks5tGh4KgaJpZM4Q6qDe

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7# issuecomment-354757483, or mute the thread https://github.com/notifications/unsubscribe- auth/ABaxoDIcjMyVI211_6TtV-_OeY2a5TbFks5tGiH8gaJpZM4Q6qDe .

-- Regards, Dr. Xiao-Tao Jiang

Department of Civil Engineering The University of HongKong

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issuecomment-354789108, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9AiPq9Oqt4hvSTpiDcAqPveWlmNQks5tGkjggaJpZM4Q6qDe .

biofuture commented 6 years ago

Hi

You had better not upload a file with 2GB, make it smaller. I suggested 500 MBs for each extracted.fa. If you have difficulties for merging or splitting the file. Please inform us and I may develop some scripts to help you.

Regards, Xiao-Tao Jiang

Department of Civil Engineering The University of HongKong

2018-01-03 10:05 GMT+08:00 maynemarie notifications@github.com:

For files of samples which are more than 2GB. Do I split them and can I merge the same sample files together after?

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) <notifications@github.com

wrote:

You can generate pcoa by merge all results. Wonnot affect it

maynemarie notifications@github.com于2018年1月2日 周二下午8:33写道:

Also, what do I do with the files of samples that are 2GB and return with an error

On Tuesday, January 2, 2018, Charmaine Ng <ng.charmainemarie@gmail.com

wrote:

Would this affect the sample comparison such as the PCO output later given that they fasta files were run separately with matching metafiles

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com> wrote:

Yes, you need to match them. I guess the two files are not match well.

发自我的 iPhone

在 2018年1月2日,下午8:01,maynemarie notifications@github.com 写道:

If we split the fasta file, do we also have to split the metadata file to match? Thanks

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com> wrote:

Dear Maynemarie

You need to upload them one by one, indeed you can upload them simultaneously in different windows for a batch running.

Sorry that at the current stage, only online version is available, the local version may be available in the future. You can upload them to a mirror site

http://smile.sustc.edu.cn:8080/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issu ecomment-354699850, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9MmT7 9jyauww33DmpBAuBRh8lHunks5tGafZgaJpZM4Q6qDe .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/biofuture/Ublastx_stageone/issues/7# issuecomment-354755033 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ Agxb9GPq827GRqqNngRWd7wdQCJMLA-dks5tGh4KgaJpZM4Q6qDe

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7# issuecomment-354757483, or mute the thread https://github.com/notifications/unsubscribe- auth/ABaxoDIcjMyVI211_6TtV-_OeY2a5TbFks5tGiH8gaJpZM4Q6qDe .

-- Regards, Dr. Xiao-Tao Jiang

Department of Civil Engineering The University of HongKong

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7# issuecomment-354789108, or mute the thread https://github.com/notifications/unsubscribe-auth/ Agxb9AiPq9Oqt4hvSTpiDcAqPveWlmNQks5tGkjggaJpZM4Q6qDe .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issuecomment-354923530, or mute the thread https://github.com/notifications/unsubscribe-auth/ABaxoNNwp2ua96EUaewCXIF_J6YlQ8Enks5tGuBRgaJpZM4Q6qDe .

maynemarie commented 6 years ago

Is there any way I can send a file to you as an example? Thanks again

On Wednesday, January 3, 2018, Xiaotao JIANG (姜小濤) notifications@github.com wrote:

Hi

You had better not upload a file with 2GB, make it smaller. I suggested 500 MBs for each extracted.fa. If you have difficulties for merging or splitting the file. Please inform us and I may develop some scripts to help you.

Regards, Xiao-Tao Jiang

Department of Civil Engineering The University of HongKong

2018-01-03 10:05 GMT+08:00 maynemarie notifications@github.com:

For files of samples which are more than 2GB. Do I split them and can I merge the same sample files together after?

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com

wrote:

You can generate pcoa by merge all results. Wonnot affect it

maynemarie notifications@github.com于2018年1月2日 周二下午8:33写道:

Also, what do I do with the files of samples that are 2GB and return with an error

On Tuesday, January 2, 2018, Charmaine Ng < ng.charmainemarie@gmail.com

wrote:

Would this affect the sample comparison such as the PCO output later given that they fasta files were run separately with matching metafiles

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com> wrote:

Yes, you need to match them. I guess the two files are not match well.

发自我的 iPhone

在 2018年1月2日,下午8:01,maynemarie notifications@github.com 写道:

If we split the fasta file, do we also have to split the metadata file to match? Thanks

On Tuesday, January 2, 2018, Xiaotao JIANG (姜小濤) < notifications@github.com> wrote:

Dear Maynemarie

You need to upload them one by one, indeed you can upload them simultaneously in different windows for a batch running.

Sorry that at the current stage, only online version is available, the local version may be available in the future. You can upload them to a mirror site

http://smile.sustc.edu.cn:8080/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issu ecomment-354699850, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9MmT7 9jyauww33DmpBAuBRh8lHunks5tGafZgaJpZM4Q6qDe .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/biofuture/Ublastx_stageone/issues/7# issuecomment-354755033 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ Agxb9GPq827GRqqNngRWd7wdQCJMLA-dks5tGh4KgaJpZM4Q6qDe

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7# issuecomment-354757483, or mute the thread https://github.com/notifications/unsubscribe- auth/ABaxoDIcjMyVI211_6TtV-_OeY2a5TbFks5tGiH8gaJpZM4Q6qDe .

-- Regards, Dr. Xiao-Tao Jiang

Department of Civil Engineering The University of HongKong

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7# issuecomment-354789108, or mute the thread https://github.com/notifications/unsubscribe-auth/ Agxb9AiPq9Oqt4hvSTpiDcAqPveWlmNQks5tGkjggaJpZM4Q6qDe .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7# issuecomment-354923530, or mute the thread https://github.com/notifications/unsubscribe- auth/ABaxoNNwp2ua96EUaewCXIF_J6YlQ8Enks5tGuBRgaJpZM4Q6qDe .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biofuture/Ublastx_stageone/issues/7#issuecomment-354927672, or mute the thread https://github.com/notifications/unsubscribe-auth/Agxb9C6emBv6iNdHoaw3Eo88QIo9Om7Eks5tGurngaJpZM4Q6qDe .

biofuture commented 6 years ago

Dear Maynemarie

If you correctly split the file and check the format of your meta_data_online.txt, you won't have any problem with the stage two.

If you have any problem with the splitting, I can write a script to help you. You can send me your meta_data_online.txt file by email: biofuture.jiang@gmail.com

Do not send me the big fasta file.

Best! Xiaotao