data61 / gossamer

Gossamer bioinformatics suite
Other
19 stars 18 forks source link

xenome classify hangs #9

Open mjafin opened 7 years ago

mjafin commented 7 years ago

Hi there, I'm testing xenome classify on an aws instance (latest Ubuntu) and it hangs after about 50 minutes. The command I used for launching the process is

xenome classify -T 8 -M 28 -P /data/Miika/idx --pairs -i dna/SRR1176814_1.fastq.gz -i dna/SRR1176814_2.fastq.gz --output-filename-prefix SRR1176814 -v > output_stats_SRR1176814.txt;

This is where it stops:

...
Tue Jan  3 10:21:59 2017        info    46700000 reads
Tue Jan  3 10:22:05 2017        info    46800000 reads
Tue Jan  3 10:22:12 2017        info    46900000 reads
Tue Jan  3 10:22:18 2017        info    47000000 reads
Tue Jan  3 10:22:24 2017        info    47100000 reads
Tue Jan  3 10:22:31 2017        info    47200000 reads
Tue Jan  3 10:22:37 2017        info    47300000 reads

The sample (SRR1176814) has 47312349 reads so it looks like it's getting to the end but then nothing happens. The process is still visible in top:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
32605 ubuntu    20   0  672200  13852   5756 S   0.0  0.0 403:05.52 xenome

output_stats_SRR1176814.txt is empty.

Any ideas?

Deguerre commented 7 years ago

Far too many ideas. Need more information to narrow it down.

First off, please confirm that all the unit tests succeeded on this platform.

Secondly, can you confirm that you tried running it more than once and got the same behaviour? If the bug is intermittent, that narrows down the possibilities.

Thirdly, a little bit of information. Could you please show me the output of:

ls -l dna/SRR1176814*.gz ls -l /data/Miika/idx*

Finally, let's try to create a cut-down test case. Could you please try this?

gunzip -c dna/SRR1176814_1.fastq.gz | head -n 4000 | gzip -9 -c > dna/test_1.fastq.gz gunzip -c dna/SRR1176814_2.fastq.gz | head -n 4000 | gzip -9 -c > dna/test_2.fastq.gz

Then run xenome classify on dna/test_1 and dna/test_2 using the same options.

If that hangs too (should be much quicker), then please send us the cut-down input files. Either attach them to the ticket, or (if you can't let the public see them) email them to me.

mjafin commented 7 years ago

Unit tests were fine when I compiled gossamer.

I ran several samples yesterday and all showed the same behaviour.

Here's the ls output:

ls -l dna/SRR1176814*.gz
-rw-rw-r-- 1 ubuntu ubuntu 4541431597 Jan  1 11:15 dna/SRR1176814_1.fastq.gz
-rw-rw-r-- 1 ubuntu ubuntu 4261961919 Jan  1 11:16 dna/SRR1176814_2.fastq.gz

and

ls -l /data/Miika/idx*
-rw-rw-r-- 1 ubuntu ubuntu         24 Jan  1 20:50 /data/Miika/idx-both.header
-rw-rw-r-- 1 ubuntu ubuntu  169401184 Jan  1 20:50 /data/Miika/idx-both.kmers-d0
-rw-rw-r-- 1 ubuntu ubuntu  291033728 Jan  1 20:50 /data/Miika/idx-both.kmers-d1
-rw-rw-r-- 1 ubuntu ubuntu         64 Jan  1 20:50 /data/Miika/idx-both.kmers.header
-rw-rw-r-- 1 ubuntu ubuntu 1095959376 Jan  1 20:50 /data/Miika/idx-both.kmers.high-bits
-rw-rw-r-- 1 ubuntu ubuntu 8945415320 Jan  1 20:50 /data/Miika/idx-both.kmers.low-bits.lwr
-rw-rw-r-- 1 ubuntu ubuntu 4472707660 Jan  1 20:50 /data/Miika/idx-both.kmers.low-bits.upr
-rw-rw-r-- 1 ubuntu ubuntu  559088464 Jan  2 12:39 /data/Miika/idx-both.lhs-bits
-rw-rw-r-- 1 ubuntu ubuntu  559088464 Jan  2 12:39 /data/Miika/idx-both.rhs-bits
-rw-rw-r-- 1 ubuntu ubuntu         24 Jan  1 17:05 /data/Miika/idx-graft.header
-rw-rw-r-- 1 ubuntu ubuntu   92224224 Jan  1 17:05 /data/Miika/idx-graft.kmers-d0
-rw-rw-r-- 1 ubuntu ubuntu  141325168 Jan  1 17:05 /data/Miika/idx-graft.kmers-d1
-rw-rw-r-- 1 ubuntu ubuntu         64 Jan  1 17:05 /data/Miika/idx-graft.kmers.header
-rw-rw-r-- 1 ubuntu ubuntu  565383992 Jan  1 17:05 /data/Miika/idx-graft.kmers.high-bits
-rw-rw-r-- 1 ubuntu ubuntu 4751176496 Jan  1 17:05 /data/Miika/idx-graft.kmers.low-bits.lwr
-rw-rw-r-- 1 ubuntu ubuntu 2375588248 Jan  1 17:05 /data/Miika/idx-graft.kmers.low-bits.upr
-rw-rw-r-- 1 ubuntu ubuntu         24 Jan  1 19:33 /data/Miika/idx-host.header
-rw-rw-r-- 1 ubuntu ubuntu   80819808 Jan  1 19:33 /data/Miika/idx-host.kmers-d0
-rw-rw-r-- 1 ubuntu ubuntu  143517136 Jan  1 19:33 /data/Miika/idx-host.kmers-d1
-rw-rw-r-- 1 ubuntu ubuntu         64 Jan  1 19:33 /data/Miika/idx-host.kmers.header
-rw-rw-r-- 1 ubuntu ubuntu  532106144 Jan  1 19:33 /data/Miika/idx-host.kmers.high-bits
-rw-rw-r-- 1 ubuntu ubuntu 4218730922 Jan  1 19:33 /data/Miika/idx-host.kmers.low-bits.lwr
-rw-rw-r-- 1 ubuntu ubuntu 2109365461 Jan  1 19:33 /data/Miika/idx-host.kmers.low-bits.upr

Building the index using 8 cores took almost a day but it did finish OK.

I ran a test using the 1000 first reads as you suggested and it finished OK - odd. Could this have something to do with how things are parallelised and communication between the threads? Here's the command I used:

xenome classify -T 8 -M 28 -P /data/Miika/idx --pairs -i dna/test_1.fastq.gz -i dna/test_2.fastq.gz --output-filename-prefix test -v > test.txt
mjafin commented 7 years ago

As this is all public data, if you're interested you can download the fastq.gz files here https://www.ebi.ac.uk/ena/data/view/SRR1176814

It looks like all the output fastq files are correctly produced though so I was able to pull together the stats I needed for my comparison in v2 of https://f1000research.com/articles/5-2741/v1 - results are very similar to our alignment based algorithm.

billnjcn111 commented 7 years ago

Same issue here, also tried to use a small fastq and still did not work. I also tried different options, e.g., single thread, single input file, same issue. Can some one look into this issue? test_01.fastq.gz test_02.fastq.gz

Thanks

Deguerre commented 7 years ago

Thanks for that. The information from ls also ruled out the old gzipped-file-is-an-exact-multiple-of-the-io-buffer-size issue that we found in a very old version of the gzip filter.

I suspect it's the job manager. We'll take a look.

billnjcn111 commented 7 years ago

Did you find any clue yet? Thanks

zz2liu commented 7 years ago

Same issue here, xenome classify hangs after the work is finished.

Deguerre commented 7 years ago

I just got back from holidays. Picking this up again.

billnjcn111 commented 7 years ago

Deguerre, How is it going? Have you found any clue how to fix the bug? ThX

danielgerlach commented 7 years ago

I see the same problem with my data, xenome classify just hangs after all output file are generated. Does anyone have updates on the topic?

Thanks

serverhorror commented 7 years ago

Anything? I'm also seeing the same problem here.

murphycj commented 7 years ago

Yes, I get the same issue. It did not happen when I tested xenome on a small test sample of reads (~4000), but when I ran it on all my samples (tens of millions of reads per sample) then it hangs after completion (or what looks like completion).

Deguerre commented 7 years ago

Just as an update, this is turning out to be a very nasty problem caused by a mismatch between two different threading models. We decided that for the open source release we should use the standard C++ threading system rather than our previous solution which we couldn't easily maintain. The hanging is caused by some of the old code relying on some detail in the previous model that nobody can remember because it was written so long ago.

Only the kmer set construction in Xenome seems to be affected. Everything else seems to work.

None of the Gossamer authors are being paid to work on this, so we have to work on it around our day jobs. As you all probably have worked out, the problem only happens on large examples, which means each individual test takes a while.

I am only speaking on behalf of myself, but I'm sure the other authors agree that we're very sorry about this, and we all want to get this finished as quickly as possible so everyone can use it. Please bear with us.

murphycj commented 7 years ago

Thanks for working toward fixing this!

kannabirannandakumar commented 7 years ago

I am facing the same issue. Has this bug been fixed now?

bonohu commented 7 years ago

I'm running xenome classify for a week on macOS Sierra (10.12.6). Is this macOS specific issue?

jasonwork9941 commented 7 years ago

I'm also experiencing the same issue and looking forward to the next fix.

In the meantime I'm attempting to whittle down the original read files to see how large of a file will still work w/out hanging. I'm down to 0.5% (yes, half of a percent) of the original; this equates to 2 read files around 150 mb uncompressed and still hanging.

Just curious what the largest file anyone has been able to run w/out hanging, and on what set up?

maheetha commented 7 years ago

Same issue. I've been using your Xenomes program for a patient xenograft sample. Your algorithm works well, however, I've been running it on bsub, and it seems to be running forever, even though the files have been output and considered finished. For example, if I have a 100,000 read fastq that has been input to xenome, it takes about a day to complete the indexing, and outputting the different .fastq is relatively quick after indexing. I checked the files and the added reads of all five of the files (ambiguous, neither, both, mouse, and human) add up to 100000 almost immediately after the files are created (i would say max 20 minutes), but for some reason the program still runs forever and ever. After two days, the jobs are still "running".

To be honest, I don't care much, and willing to write code around it to make sure that the files add up to the original read count. As long as the output is accurate. Can anyone confirm that their output is accurate? For mine it seems that Xenomes reaches the target accuracy, so i'm assuming that once xenome has "finished" the output is accurate and considered done.

obwan74 commented 6 years ago

I can confirm too. Classify after running about 20 minutes or so hangs. Logs show,in my case, processing of ~10million reads classify does not advance. Interestingly, input fastq reads match ouput reads. I have to kill each process after 30 min. Program executes its job but cannot finish it.

Indexing also took indefinetly long. I played with -M and -T parameters and made it to work in about 8 hrs. Although I could go over 124GB memory on cluster with 16 threads I used 64GB and 12 threads to make it to work. This is arbitrary with no explanation. I wish documentation is more detailed and clear enough. But xenome works in gossamer, i can move on to next step.

splaisan commented 6 years ago

(My apologies for now deleted post, I used -I io -i for classify and fastq data - was not reporting error and just hanging)

I could now run a classify job but after processing the 2000000 reads it does not exit. Can I safely kill it manually?

zz2liu commented 6 years ago

I found a work around:

Hope it helps.

serverhorror commented 6 years ago

It doesn't exit at all. We've stopped using it since it didn't work for us.

splaisan commented 6 years ago

true but it does the job in our case, what you lack are a happy end of the run and the stats. We compared it to other tools and found that it does what it it expected to do, only a pity that the developer do not put time in fixing the exit issue.

rushikapandya commented 6 years ago

Hello,

I'm facing a similar issue. Im running xenome classify on my computer but it hangs/ or takes a really long time to run.

xenome classify -v -T 8 -M 8 -P ../../reference_sequence/xenome_idx/idx -i sample_R1.fastq.gz --output-filename-prefix sample_merged
Fri Jun 29 12:20:01 2018    info    opening buffer 0 /var/folders/mp/xd5y68q53zjdvvr81k8d30g40000gp/T//1530300001-47497-0-classbuf-0
Fri Jun 29 12:20:01 2018    info    performing 2 passes
Fri Jun 29 12:20:01 2018    info    parsing sequences from 44279_11_merged_R1.fastq.gz
Fri Jun 29 12:20:01 2018    info    writing to 44279_11_merged_neither.fastq
Fri Jun 29 12:20:01 2018    info    writing to 44279_11_merged_both.fastq
Fri Jun 29 12:20:01 2018    info    writing to 44279_11_merged_graft.fastq
Fri Jun 29 12:20:01 2018    info    writing to 44279_11_merged_host.fastq
Fri Jun 29 12:20:01 2018    info    writing to 44279_11_merged_ambiguous.fastq
Fri Jun 29 12:20:01 2018    info    pass 0
Fri Jun 29 12:20:01 2018    info    parsing sequences from 44279_11_merged_R1.fastq.gz

It seems to be stuck. The process is visible in top but shows up as sleeping.

Processes: 366 total, 2 running, 5 stuck, 359 sleeping, 1657 threads                       12:26:52
Load Avg: 2.02, 2.04, 2.14  CPU usage: 3.14% user, 2.17% sys, 94.68% idle
SharedLibs: 172M resident, 35M data, 12M linkedit.
MemRegions: 114419 total, 2116M resident, 64M private, 3151M shared.
PhysMem: 8167M used (1922M wired), 22M unused.
VM: 1646G vsize, 1097M framework vsize, 102331040(0) swapins, 106855192(0) swapouts.
Networks: packets: 409129203/172G in, 607434675/394G out.
Disks: 98600050/3125G read, 21012651/1205G written.

PID    COMMAND      %CPU TIME     #TH   #WQ  #PORT MEM    PURG   CMPRS  PGRP  PPID  STATE
97228  xenome       0.0  06:17.01 7     0    19    12K    0B     4436K  97228 50512 sleeping

I'd really appreciate your help on this!

splaisan commented 6 years ago

I do not think we will ever get help on this... In our experience the job is done at that stage. You can kill it and check that the sum of the number of reads in all outputs matches the input. What we mis is the summary table but you can produce it too from thje outputs.

rushikapandya commented 6 years ago

Hello,

Thank you for your reply. But unfortunately my reads don't add up. I tried running it again using different -M and -T settings and it seems to be running though very slowly!!!

splaisan commented 6 years ago

U would use N threads and at least 4g ram per thread? Better 6 to 8 if you have homo+mus data. Also  try with 10M reads first to see if that worksGood luck Sent from my smartphone. -------- Original message --------From: Rushika Pandya notifications@github.com Date: 7/3/18 21:55 (GMT+01:00) To: data61/gossamer gossamer@noreply.github.com Cc: Stephane Plaisance stephane.plaisance@vib.be, Comment comment@noreply.github.com Subject: Re: [data61/gossamer] xenome classify hangs (#9) Hello, Thank you for your reply. But unfortunately my reads don't add up. I tried running it again using different -M and -T settings and it seems to be running though very slowly!!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/data61/gossamer","title":"data61/gossamer","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/data61/gossamer"}},"updates":{"snippets":[{"icon":"PERSON","message":"@rushikapandya in #9: Hello,\r\n\r\nThank you for your reply. But unfortunately my reads don't add up. I tried running it again using different -M and -T settings and it seems to be running though very slowly!!!"}],"action":{"name":"View Issue","url":"https://github.com/data61/gossamer/issues/9#issuecomment-402274610"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/data61/gossamer/issues/9#issuecomment-402274610", "url": "https://github.com/data61/gossamer/issues/9#issuecomment-402274610", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [data61/gossamer] xenome classify hangs (#9)", "sections": [ { "text": "", "activityTitle": "Rushika Pandya", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@rushikapandya", "facts": [

] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"data61/gossamer\",\n\"issueId\": 9,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"data61/gossamer\",\n\"issueId\": 9\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/data61/gossamer/issues/9#issuecomment-402274610" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 191247166\n}" } ], "themeColor": "26292E" } ]

romanhaa commented 6 years ago

Does anybody know of a workaround? I installed Xenome in a Docker container (Ubuntu 16.04) and indexing worked fine but the classify step hangs after parsing all reads without any particular message just like it did for all of you. I tried with FASTQ files containing either 25k or 2 million reads and both failed. Any alternative tool to use or other ideas?

kannabirannandakumar commented 6 years ago

I have an older version of xenome that does not have this issue. Let me know if you need it, I can share.

On Wed, Jul 4, 2018 at 12:07 PM romanhaa notifications@github.com wrote:

Does anybody know of a workaround? I installed Xenome in a Docker container and indexing worked fine but the classify step hangs without any particular message just like it did for all of you. I tried with FASTQ files containing either 25k or 2 million reads and both failed. Any alternative tool to use or other ideas?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/data61/gossamer/issues/9#issuecomment-402518222, or mute the thread https://github.com/notifications/unsubscribe-auth/ASoqMoCtMcclEoiTXdXqbCDTbc2k0PZdks5uDOg_gaJpZM4LZjM- .

romanhaa commented 6 years ago

@kannabirannandakumar yes that would be fantastic!

jiao2017 commented 6 years ago

@kannabirannandakumar I need the older version too, can you share it ?_

romanhaa commented 6 years ago

A colleague provided me a pre-compiled version that works on my system. I have no idea where it comes from and what is different, but you can download it from this link: https://drive.google.com/file/d/1AAmFKT5huWJ6H_8liFsuqFLRSdJBq10b/view?usp=sharing

Credit goes to the authors (obviously).

Deguerre commented 6 years ago

What is different is that it was compiled with an archaic version of the C++ standard and an older version of Boost. There is an incompatibility with modern C++ that we haven't had a chance to fix yet. All we know so far is that it's a very subtle threading semantics issue and it isn't anything simple.

As one of the authors, I have no problem with people sharing the old compiled binary until we fix it given that NICTA is no more, especially if you do it by directing people here to the explanation. That version is, after all, the version about which all published claims are made!

Having said that, I don't own it and I don't speak for Data61 (the current owners).

jeffpkamp commented 5 years ago

I've worked around this issue by essentially checking for the program to complete and then killing it. The best way I've found to check for completion is the cessation of writing to the output fastqs. I put it in its own folder and run the following script in the background to check modification times on the files it is writing to. If none of them are written to in 60 seconds it kills xenome and moves on.

#run xenome classifier in background with &
sleep 60
while [[ 1 -eq 1 ]]
do
        n=0
        now=`date +%s`
        for x in *fastq
                do
                ((n=n+$(echo | awk  -v mod=$(date +%s -r $x) -v now=$(date +%s) '{if (now-mod > 120) print 1;else print 0}')))
        done
        echo "$n files not updated"
        if [[ $n -gt 4 ]]
                then break
                else sleep 10
        fi
done

killall -r xenome
vedellpt commented 5 years ago

(My apologies for now deleted post, I used -I io -i for classify and fastq data - was not reporting error and just hanging)

I could now run a classify job but after processing the 2000000 reads it does not exit. Can I safely kill it manually?

I experienced something similar to this when using fastq.gz as the input. Then, I decompressed the fastq files and used the resulting .fastq files as the input. That worked.

vigneshravi commented 5 years ago

I have the same issue - Trying to run xenome on a test paired fastq dataset. R1- 911079 reads R2 – 911079 reads

The “classify” job is taking over 13 days and still going on. But, It has emitted a set of output files within the first 18 hours of the start of the job. Is this normal? How long does it take for a million reads to processed from your past experience? The fastq indexing for Human and Mouse references were successful – which took only 4 days.

Haz1y commented 1 year ago

A colleague provided me a pre-compiled version that works on my system. I have no idea where it comes from and what is different, but you can download it from this link: https://drive.google.com/file/d/1AAmFKT5huWJ6H_8liFsuqFLRSdJBq10b/view?usp=sharing

Credit goes to the authors (obviously).

I am very sorry to disturb you, but the link you provided is not working and I would like the old version of the xenome you provided

Haz1y commented 1 year ago

I have an older version of xenome that does not have this issue. Let me know if you need it, I can share. On Wed, Jul 4, 2018 at 12:07 PM romanhaa @.***> wrote: Does anybody know of a workaround? I installed Xenome in a Docker container and indexing worked fine but the classify step hangs without any particular message just like it did for all of you. I tried with FASTQ files containing either 25k or 2 million reads and both failed. Any alternative tool to use or other ideas? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ASoqMoCtMcclEoiTXdXqbCDTbc2k0PZdks5uDOg_gaJpZM4LZjM- .

I need the older version of xenome that does not have this issue, it would be nice if docker was available

romanhaa commented 1 year ago

A colleague provided me a pre-compiled version that works on my system. I have no idea where it comes from and what is different, but you can download it from this link: https://drive.google.com/file/d/1AAmFKT5huWJ6H_8liFsuqFLRSdJBq10b/view?usp=sharing

Credit goes to the authors (obviously).

I am very sorry to disturb you, but the link you provided is not working and I would like the old version of the xenome you provided

You can extract the binary from the following Docker image: https://hub.docker.com/r/romanhaa/xenocell

I don't remember the path off the top of my head but it should be easy to find.

mdozmorov commented 1 year ago

This issue remains current. The solution provided by @jeffpkamp at https://github.com/data61/gossamer/issues/9#issuecomment-436763548 is a workaround. To make it work for multiple samples, I modified it as:

# https://hub.docker.com/r/repbioinfo/xenome.2017.01/tags
# singularity pull repbioinfo/xenome.2017.01
SIF=/home/user/data/TestData/xenome.2017.01_latest.sif
THREADS=8
HS=/home/user/data/ExtData/UCSC/hg38/hg38.fa
MM=/home/user/data/ExtData/UCSC/mm39/mm39.fa
# Index genomes, one time
# singularity exec ${SIF} xenome index -T ${THREADS} -P idx -H ${MM} -G ${HS}

DIRIN=/home/user/data/WorkData/RNA-seq/00_raw
DIROUT=/home/user/data/WorkData/RNA-seq/00_raw_xenome
mkdir -p ${DIROUT}

# Single-end
for file in `find ${DIRIN} -type f -name "*.fastq.gz" | grep -v Undetermined`; do
  # Process individual samples
  SAMPLE=`basename ${file} .fastq.gz`
  # Run xenome in the background, note "&"
  singularity exec ${SIF} xenome classify -T ${THREADS} -P idx --host-name mouse --graft-name human -i ${file} &
  # Monitor the file update times
  sleep 1
  while [[ 1 -eq 1 ]]
  do
          n=0
          now=`date +%s`
          for x in *fastq
                  do
                  ((n=n+$(echo | awk  -v mod=$(date +%s -r $x) -v now=$(date +%s) '{if (now-mod > 120) print 1;else print 0}')))
          done
          echo "$n files not updated"
          if [[ $n -gt 4 ]]
                  then break
                  else sleep 10
          fi
  done
  # Terminate xenome and only then run next commands, note ";"
  killall -r xenome ;
  # Move the default files to sample-specific into a subfolder
  mv human.fastq ${DIROUT}/${SAMPLE}_human.fastq
  mv mouse.fastq ${DIROUT}/${SAMPLE}_mouse.fastq
  mv ambiguous.fastq ${DIROUT}/${SAMPLE}_ambiguous.fastq
  mv both.fastq ${DIROUT}/${SAMPLE}_both.fastq
  # Wait 2 seconds and only then resume the loop, note ";"
  sleep 2 ;
done

gzip ${DIROUT}/*.fastq
Deguerre commented 1 year ago

I recommend that everyone uses one of the above workarounds.

The Xenome authors don't get paid to work on this, so our time is limited. We have decided that rather than fixing this version, releasing a new version, with updates to suit the realities of modern hardware and modern C++, would be a better use of that limited time. The fix to this bug is to replace one of the offending algorithms, rather than trying to fix this branch.

I am looking for beta testers. If you've seriously used Xenome, and would be interested in helping out, we would appreciate it.

mdozmorov commented 1 year ago

Re: beta testers - I'll be interested. We analyze a lot of PDX data, filter mouse reads by aligning to the combined genome, but would rather use and cite Xenome.

Deguerre commented 1 year ago

Excellent! What's the best way to contact you that isn't quite this public?

EDIT: Your email at vcu dot edu?

mdozmorov commented 1 year ago

That's correct.

tamuanand commented 11 months ago

Hi @mdozmorov and @Deguerre - just curious - are there any updates on this

tamuanand commented 11 months ago

I know that @Deguerre mentioned that xenome authors do not get paid to work on these and their time is limited. It is a very fair and valid comment. With this in mind, one thing to note:

mdozmorov commented 11 months ago

Interesting tool, the paper introduces others. I worked with https://github.com/BioInfoTools/BBMap/blob/master/sh/bbsplit.sh, but it is slow and also not maintained.

splaisan commented 11 months ago

Interesting tool, the paper introduces others. I worked with https://github.com/BioInfoTools/BBMap/blob/master/sh/bbsplit.sh, but it is slow and also not maintained.

Hi @mdozmorov this is not the official BBmap repo, the best way to get help from Brian Bushnell is through seqanswers

mdozmorov commented 11 months ago

Thanks, @splaisan, SourceForge and the BBtools website show active development. I hope Xenome development will resume as well.

tamuanand commented 11 months ago

FWIW, I tried xengsort and it is much much faster and provides concordant results with xenome