drivenbyentropy / aptasuite

A full-featured bioinformatics software collection for the comprehensive analysis of aptamers in HT-SELEX experiments.
https://drivenbyentropy.github.io/
GNU General Public License v3.0
24 stars 11 forks source link

GUI - works only once #12

Closed PJpb closed 6 years ago

PJpb commented 6 years ago

Hi, So I've tried to use the GUI to analyze real-data derived sequences from one of the rounds from my selection. The file is quality-checked fastq after paired-reads joining done elsewhere. I setup the experiment and run the parsing, and it worked nicely. However, after setting up another experiment (even a similar one with slighly different parameters/different folder) I get an error - the parsing is completed within a split second, nothing is displayed at Experiment Overview (blank background), and when I switch to other tabs they're loading infinitely. I can however open the first experiment which worked, and it's fine.

Please find the configuration file and the log attached. configuration.aptasuite_.txt log_2018-02-16_14-39-25.txt

I'm using aptasuite-0.5.3 on Windows 10.

edit: I've tried it on another computer with a fresh download and freshly installed JAVA. No luck, doesn't even analyze the data once. Parsing takes a split second and no results are produced.

drivenbyentropy commented 6 years ago

Please check if version v0.5.4 fixes this issue. I believe this was due to the configuration not being cleared correctly creating a mix between the old and the new experiment.

PJpb commented 6 years ago

Dear drivenbyentropy, Unfortunetely neither the GUI nor command line v0.5.4 is working properly now.

For the GUI: when I pressed "Import Data" for the first time I instantly got "Parsing completed". Here are the files: log_2018-02-16_21-14-54.txt configuration.aptasuite.txt (Configuration changed to .txt due to github requirements)

When I tried it for the second time (with different folder paths for output data), aptasuite shut down after pressing "Import data". log_2018-02-16_21-17-26.txt configuration.aptasuite.txt (Same source data file)

Also the command line version is not working. Please find the files attached. powershell_log.txt log_2018-02-16_21-08-59.txt cfg.txt (Same source data file)

drivenbyentropy commented 6 years ago

Yep, I think there was another bug which caused this behavior. Could you please try with version v0.5.5? This should now correctly open and close experiments.

Thanks

drivenbyentropy commented 6 years ago

Hmmm, this should however not affect the CLI version. I cannot reproduce this issue locally without the sequence file.

PJpb commented 6 years ago

Sorry, I was pretty sure that I've uploaded the source file as well! Here it is: 1302_merged_1000seq_txt.txt (converted from fastq to txt for github compatibility)

I'm testing v0.5.5 now.

PJpb commented 6 years ago

No luck with CLI (v0.5.5): cfg3.txt 1302_merged_1000seq_txt.txt log_2018-02-16_21-55-43.txt powershell_log.txt

PJpb commented 6 years ago

No luck with GUI either (v0.5.5), "Parsing completed" within a split second, no data output. As if it didn't see any sequences in the file? Source file: 1302_merged_1000seq_txt.txt Configuration created by aptasuite: configuration.aptasuite_txt.txt Log: log_2018-02-16_22-04-26.txt

drivenbyentropy commented 6 years ago

Thanks for the sequence file. I can now reproduce the error and I am looking into the cause. I suspect this might have something to do with the pre-processing done to the file as this is really the only difference between my tests and yours. Let me see what I can find.

Thanks for all your help!

drivenbyentropy commented 6 years ago

I figured it out. Its the encoding of the sequence file... The file is encoded in UCS-2 LE BOM however AptaPlex expects UTF-8. Did you progress the file on Windows?

Changing the encoding to UTF-8 works as expected.

1302_merged_1000seqUTF.txt

Edit: To be more precise, the file you sent me is in

$ file -b -i 1302_merged_1000seq.txt
text/plain; charset=utf-16le

utf-16le encoding wheres the standard encoding for fastq files is

$ file -b -i 1302_merged_1000seqUTF.txt
text/plain; charset=us-ascii

is assumed to be us-ascii.

PJpb commented 6 years ago

Yes, I did - I cut out first 4k lines of the original sequencing file (after QC and all done via Galaxy server). I'll give it a try in a sec.

PJpb commented 6 years ago

Ok, so for CLI v0.5.6:

  1. exactely as you've said, changing to UTF-8 works fine! Thank you for pointing this out, and sorry for bothering you with such nonsense. It works for a file with merged QCed reads 1302_merged_utf8_1000seq_txt.txt cfg4.txt

  2. However it still crashes for two pre-merged paired-end files (not groomed, raw data I've got from NGS supplier). 1302_raw_fwd_utf8_1000seq_txt.txt 1302_raw_rev_utf8_1000seq_txt.txt cfg7.txt log_2018-02-16_23-14-51.txt powershell_log.txt

I'll test the GUI now.

drivenbyentropy commented 6 years ago

Thanks for the update. The GUI should give the same results as both CLI and GUI call the same parsing library.

PJpb commented 6 years ago

Should ;) GUI v0.5.6 still does not parse anything, i.e. the parsing completes momentarily. image And there's no output whatsoever. Source file: 1302_merged_utf8_1000seq_txt.txt Aptasuite configuration: configuration_txt.txt (quite a lot of backslashes in filepaths) Log: log_2018-02-16_23-25-09.txt

Also, when I've tried to File->Open Experiment and used the configuration file from this run, aptasuite closed down with no error message, but it did produce a log: log_2018-02-16_23-26-52.txt

drivenbyentropy commented 6 years ago

That is so strange, I cannot reproduce the GUI issue here, with the same files... Is your drive E: a network drive by any chance? If so, could you try running this with all data and experiment folder on a local drive?

PJpb commented 6 years ago

It's a local drive. Are you trying to reproduce it using Windows?

I'll try to run it using a virtual machine with a UNIX system.

drivenbyentropy commented 6 years ago

I'm running it on a Windows machine, yes. Can you start the GUI via Powershell on Windows (just run aptasuite without any parameters) and do the import again. Then paste the output of the console? It should at least indicate where it gets stuck.

drivenbyentropy commented 6 years ago

No issues running on my Linux setup either. GUI and CLI are working with the merged file. I am running out of ideas as to what might be the cause here.

PJpb commented 6 years ago

Sorry, didn't see your comment edit. Can you start the GUI via Powershell on Windows (just run aptasuite without any parameters) and do the import again. Then paste the output of the console? It should at least indicate where it gets stuck. console_output.txt

WARNING: Loading FXML document with JavaFX API of version 9 by JavaFX runtime of version 8.0.161 Am I wasting your time again?...

drivenbyentropy commented 6 years ago

You are not wasting my time, on the contrary :).

The warning however has nothing to do with the bug. The console log on the other side has some juicy stuff in it ;-).

Btw. the paired end case should be fixed now as well --> v0.5.7

Thanks!

PJpb commented 6 years ago

My last try today, it's 1 AM ;) Will update this comment when I run it.

update: Indeed, v0.5.7 CLI works fine with paired end datafiles! Thank you!

drivenbyentropy commented 6 years ago

Thank you again for all your help and dedication, I really appreciate it :+1:

I will try to solve the remaining issues asap.

PJpb commented 6 years ago

I'm glad that you're not discouraged by me spamming you with the issues. You're really doing a great job!

I've just run the v0.5.7 GUI to import data, the situation is a bit different: after hitting the button, some information does show up in gray (instead of "Parsing completed"), but then the whole application closes down before I even read it. But I've managed to make a print screen. image

Here are the files: configuration_txt.txt log_2018-02-17_01-03-16.txt console_log.txt Hope that helps. Keep up the good job!

drivenbyentropy commented 6 years ago

"Awesome" ;-)

This helps a lot, thank you. It seems that the only difference between mine and your setup is that you are running Windows 10 and I am running Windows 7. If you get the chance next week, could you report the outcome when running this in a VM using Linux? I on the other hand will try to get my hands on a Windows 10 machine.

Again, thank you so much for your help and please do not hesitate to report any additional issues you may encounter.

Cheers!

PJpb commented 6 years ago

It looks like I owe you an apology for having wasted your time. It seems that E:\OneDrive\ is treated as a network drive. When I moved all the files to a folder on E:\ it seems to work. (also when running a VM, the app and the data cannot be located on a mounted drive shared from host; they have to be local as well. Tested on VM Ubuntu 16.04) I will give it a couple of more tries over the weekend anyway.

edit: So ok, it works locally (it's funny that windows interpretes local OneDrive data as network drive; I've intentionally switched OneDrive off yesterday). For GUI v0.5.7 data import works in all cases. It also opens a previously parsed experiment. Aptamer Pool loading still takes forever; Clusters don't display in the middle window; sorting cannot be changed. I'll let you know the details in the other tickets. And sorry again for the network/local misunderstading...

edit2: When I start the GUI from PowerShell run as administrator, it can open previously done experiments. However when I open the aptasuite.jar simply by double-clicking and try to open an experiment, it closed down without any notice.

drivenbyentropy commented 6 years ago

No worries, I am glad we are getting to the bottom of it :)

I have run into similar issues before and I believe this is all related to how files are buffered in memory when using a network drive (especially a cloud based one). Here is what I believe happens: When you copy (or save) a file in such a drive, the OS indicates that the IO operation is completed sooner than this is the case in reality (files are synced in the background). This in turn in detrimental to my code as it assumes a write operation to complete 100% before continuing with the remaining logic, especially in the New Experiment Wizard which relies heavily on IO operations.

Edit2 is interesting. How was the first experiment created? Using the powershell as SU or by double-clicking? If possible could you test the following scenarios?

I am testing the development branch almost exclusively using Powershell when on Windows but I believe it is not running as superuser.

Thanks!

drivenbyentropy commented 6 years ago

Can this ticket be considered fixes as well now that #14 is closed?

PJpb commented 6 years ago

Seems to work as well.