ConesaLab / tappAS

This repository contains the source code for the tappAS application. See README for details.
https://app.tappas.org/
GNU General Public License v3.0
16 stars 2 forks source link

Unable to create a project (time course with multiple groups) #48

Closed rugilemat closed 7 months ago

rugilemat commented 1 year ago

Hi,

I have been trying to set up a tappAS project for ONT long read data with no luck.

I keep getting the same error message:

0:55 - Checking design file: /Users/rugile/Library/CloudStorage/OneDrive-King'sCollegeLondon/PhD/Experiments/Deep read seq/Data files/trial_matrix.tsv 0:55 - Design file passed initial check. 0:55 - Checking specified matrix file (Time_Course_Multiple): /Users/rugile/Library/CloudStorage/OneDrive-King'sCollegeLondon/PhD/Experiments/Deep read seq/Data files/talon_trial.tsv 0:55 - Matrix file passed initial check. 0:55 - The custom annotation file has been modified. We are going to load the version you used to create the project. 0:55 - Annotation loaded correctly. Check if the project contains the correct annotation or you want to create a new one. 0:55 - Input Data dialog results: tappas.DlgInputData$Params@41d1a543 0:55 - Process Input Data script is running... 0:55 - Processing expression matrix file: /Users/rugile/Library/CloudStorage/OneDrive-King'sCollegeLondon/PhD/Experiments/Deep read seq/Data files/talon_trial.tsv... 0:55 - Experiment type: Multiple Series Time-Course 0:55 - Matrix has 4 experimental groups, 2 time events, and a total of 54 samples. 0:55 - Sample column mapping: 12, 14, 16, 18, 20, 22, 24, 26, 28, 13, 15, 17, 19, 21, 23, 25, 27, 29, 30, 32, 34, 36, 38, 40, 42, 44, 46, 31, 33, 35, 37, 39, 41, 43, 45, 47, 48, 50, 52, 49, 51, 53, 54, 56, 58, 60, 62, 64, 55, 57, 59, 61, 63, 65 0:55 - [ERROR] Unable to copy expression matrix input file: 65 0:55 - [ERROR] Unable to process Expression Matrix file.

I have double and triple-checked that my design file and experimental matrix file column names match but I really don't understand what might be going wrong. Any advice would be super appreciated.

almart7 commented 1 year ago

Hello @rugilemat , thanks for using tappAS!

You have more information about the file structures (design and expression matrix) shown on the website https://app.tappas.org/overview/ (Project section). We also have a tutorials section and a FAQs section in the tappAS website. The first question is about how to find additional documentation for tappAS, and contains the links to the ISMB tutorial video and data files.

I hope this is useful, but do not hesitate to contact us with more information about your files if you are still having problems.

Kind regards,

rugilemat commented 1 year ago

Hi @almart7,

I have been following the documentation, however, had no luck. I am attaching my counts file and design matrix. If relevant, I'm running R 4.2.2 on MacOS (Intel).

Thanks!

Archive.zip

almart7 commented 1 year ago

Dear rugilemat,

I have been checking your files and there is one thing that it's not clear to me. In your error message there was: 'Matrix has 4 experimental groups, 2 time events, and a total of 54 samples'

But the design matrix you send has: 1 experimental groups, 2 time events, and a total of 18 samples.

Is this correct? Also, which annotation file are you using?

Kind regards,

El mié, 15 mar 2023 a las 17:24, rugilemat @.***>) escribió:

Hi @almart7 https://github.com/almart7,

I have been following the documentation, however, had no luck. I am attaching my counts file and design matrix. If relevant, I'm running R 4.2.2 on MacOS (Intel).

Thanks!

Archive.zip https://github.com/ConesaLab/tappAS/files/10982295/Archive.zip

— Reply to this email directly, view it on GitHub https://github.com/ConesaLab/tappAS/issues/48#issuecomment-1470350954, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVOPCZ6R4TSWJVUD63GXXDLW4HUNDANCNFSM6AAAAAAVXZKQOQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Alessandra

rugilemat commented 1 year ago

Hi @almart7,

My apologies, I have been away, and haven't got round to responding to you.

I have just been trying various versions of the design matrix, including the one I attached. I have tried using the isoannot annotation (see the link here: https://data.cyverse.org/dav-anon/iplant/home/rugilemat/isoannot_newGFF3.gff3) as well as just the regular Homo Sapiens annotation that comes up as tappAS option.

When I use the options I attached, here's how the log looks:

11:01:44.755 - tappas 1.0.7 using Java 8.0.361-b09 11:01 - Available CPUs for JVM: 12 11:01 - Available memory for JVM: 3.56 GB 11:01 - Setting Rscript file path for Mac OS to /usr/local/bin/Rscript 11:01 - Setting Rscript full file path to: /usr/local/bin/Rscript 11:01 - Setting Rscript full file path to: /usr/local/bin/Rscript 11:01 - Setting Rscript full file path to: /usr/local/bin/Rscript 11:01 - Setting Rscript file path for Mac OS to /usr/local/bin/Rscript 11:01 - Setting Rscript full file path to: /usr/local/bin/Rscript 11:03 - Checking specified annotation file: /Users/rugile/Downloads/Archive/isoannot_newGFF3.gff3 11:03 - Checking design file: /Users/rugile/Downloads/Archive/exp_matrix_female.tsv 11:03 - Design file passed initial check. 11:03 - Checking specified matrix file (Time_Course_Single): /Users/rugile/Downloads/Archive/flair.quantify.counts_rdered.tsv 11:03 - Matrix file passed initial check. 11:03 - The custom annotation file has been modified. We are going to load the version you used to create the project. 11:03 - Annotation loaded correctly. Check if the project contains the correct annotation or you want to create a new one. 11:03 - Input Data dialog results: tappas.DlgInputData$Params@6c341214 11:03 - The custom annotation file has been modified. We are going to load the version you used to create the project. 11:03 - Annotation loaded correctly. Check if the project contains the correct annotation or you want to create a new one. 11:03 - Process Input Data script is running... 11:13 - Processing expression matrix file: /Users/rugile/Downloads/Archive/flair.quantify.counts_rdered.tsv... 11:13 - Experiment type: Single Series Time-Course 11:13 - Matrix has 1 experimental groups, 2 time events, and a total of 18 samples. 11:13 - Sample column mapping: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 11:14 - [WARN] Unable to find annotation entry for 74181 transcript(s). processExpMatrixFile. 11:14 - All transcripts missing annotations WILL BE IGNORED. 11:14 - Expression matrix transcripts missing annotation: ENST00000000233.10_TALONG000086770, ENST00000000412.8_TALONG000112757, ENST00000001008.6_ENSG00000004478.8, ENST00000002165.11-1_ENSG00000034693.15, ENST00000002165.11_ENSG00000034693.15, ENST00000002501.11_ENSG00000003249.15, ENST00000002596.6_TALONG000124139, ENST00000003100.13_TALONG000082353, ENST00000003583.12_ENSG00000001460.18, ENST00000004103.8_ENSG00000002933.9, ENST00000004982.6_ENSG00000004776.13, ENST00000005082.13_ENSG00000005801.18, ENST00000005178.6_TALONG000082472, ENST00000005257.7_TALONG000078657, ENST00000005259.9_TALONG000086413, ENST00000005340.10_ENSG00000004975.12, ENST00000005374.10_ENSG00000006625.18, ENST00000005386.8_ENSG00000005175.10, ENST00000005558.8_TALONG000086550, ENST00000006053.7_ENSG00000006210.7, ENST00000006275.8_ENSG00000007255.10, ENST00000006658.11_ENSG00000006282.21, ENST00000006777.11_TALONG000080994, ENST00000007264.7_ENSG00000007376.8ENST00000007414.8_ENSG00000006025.12 Only showing the first 25 transcripts. 11:14 - Expression matrix file processing completed. 11:14 - Writing expression matrix factors file to /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/exp_factors.txt 11:14 - Generated expression matrix factors file in 0 ms 11:14 - Writing time course factors file to /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/Data/time_factors.txt 11:14 - Generated time factors file in 0 ms 11:14 - Processing transcripts... 11:14 - Matrix script: /var/folders/1s/272sqggx165c1hm6zhdq5j5r0000gn/T/tappas8652914611628248001.R 11:14 - Rscript: /usr/local/bin/Rscript 11:14 - Running ExpMatrix filter/normalization script: [/usr/local/bin/Rscript, /var/folders/1s/272sqggx165c1hm6zhdq5j5r0000gn/T/tappas8652914611628248001.R, /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/input_matrix.tsv, /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/exp_factors.txt, /Users/rugile/tappasWorkspace/References/UserDefined.01487230048.tappas/transcript_lengths.tsv, /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/input_normalized_matrix.tsv, 0.0, 0.0, N] 11:14 - Input matrix Filter/Normalization process started, process id: java.lang.UNIXProcess@4147c294 11:14 - Input matrix Filter/Normalization process is running... 11:14 - Ikeliamas reikalingas paketas: Biobase 11:14 - Ikeliamas reikalingas paketas: BiocGenerics 11:14 - 11:14 - Pridedamas paketas: 'BiocGenerics' 11:14 - 11:14 - Sie objektai yra uzmaskuoti nuo 'package:stats': 11:14 - 11:14 - IQR, mad, sd, var, xtabs 11:14 - 11:14 - Sie objektai yra uzmaskuoti nuo 'package:base': 11:14 - 11:14 - Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append, 11:14 - as.data.frame, basename, cbind, colnames, dirname, do.call, 11:14 - duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, 11:14 - lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, 11:14 - pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, 11:14 - tapply, union, unique, unsplit, which.max, which.min 11:14 - 11:14 - Welcome to Bioconductor 11:14 - 11:14 - Vignettes contain introductory material; view with 11:14 - 'browseVignettes()'. To cite Bioconductor, see 11:14 - 'citation("Biobase")', and for packages 'citation("pkgname")'. 11:14 - 11:14 - Ikeliamas reikalingas paketas: splines 11:14 - Ikeliamas reikalingas paketas: Matrix 11:14 - Expression Matrix Filtering and Normalization script arguments: /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/input_matrix.tsv /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/exp_factors.txt /Users/rugile/tappasWorkspace/References/UserDefined.01487230048.tappas/transcript_lengths.tsv /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/input_normalized_matrix.tsv 0.0 0.0 N 11:14 - 11:14 - Reading input matrix file data... 11:14 - Read 0 transcripts expression data rows 11:14 - Reading factors file data... 11:14 - Reading transcript length file data... 11:14 - Saving results to file... 11:14 - All done. 11:14 - Input matrix Filter/Normalization script ended. Exit value: 0 11:14 - Creating expression matrix file. 11:14 - [ERROR] Invalid line, 2, found in expression matrix file. 11:14 - [ERROR] Unable to write project data results to file. 11:14 - Closing project 'trial'

almart7 commented 1 year ago

Dear @rugilemat,

Sorry for the late reply. I have checked your design file and you have only one group. This design file corresponds to a Single Series Time-Course Design File, is there any reason you are trying to create a Multiple Series Time-Course project?

Kind regards,

El vie, 14 abr 2023 a las 17:35, rugilemat @.***>) escribió:

Hi @almart7 https://github.com/almart7,

My apologies, I have been away, and haven't got round to responding to you.

I have just been trying various versions of the design matrix, including the one I attached. I have tried using the isoannot annotation (see the link here: https://data.cyverse.org/dav-anon/iplant/home/rugilemat/isoannot_newGFF3.gff3) as well as just the regular Homo Sapiens annotation that comes up as tappAS option.

When I use the options I attached, here's how the log looks:

11:01:44.755 - tappas 1.0.7 using Java 8.0.361-b09 11:01 - Available CPUs for JVM: 12 11:01 - Available memory for JVM: 3.56 GB 11:01 - Setting Rscript file path for Mac OS to /usr/local/bin/Rscript 11:01 - Setting Rscript full file path to: /usr/local/bin/Rscript 11:01 - Setting Rscript full file path to: /usr/local/bin/Rscript 11:01 - Setting Rscript full file path to: /usr/local/bin/Rscript 11:01 - Setting Rscript file path for Mac OS to /usr/local/bin/Rscript 11:01 - Setting Rscript full file path to: /usr/local/bin/Rscript 11:03 - Checking specified annotation file: /Users/rugile/Downloads/Archive/isoannot_newGFF3.gff3 11:03 - Checking design file: /Users/rugile/Downloads/Archive/exp_matrix_female.tsv 11:03 - Design file passed initial check. 11:03 - Checking specified matrix file (Time_Course_Single): /Users/rugile/Downloads/Archive/flair.quantify.counts_rdered.tsv 11:03 - Matrix file passed initial check. 11:03 - The custom annotation file has been modified. We are going to load the version you used to create the project. 11:03 - Annotation loaded correctly. Check if the project contains the correct annotation or you want to create a new one. 11:03 - Input Data dialog results: @. 11:03 - The custom annotation file has been modified. We are going to load the version you used to create the project. 11:03 - Annotation loaded correctly. Check if the project contains the correct annotation or you want to create a new one. 11:03 - Process Input Data script is running... 11:13 - Processing expression matrix file: /Users/rugile/Downloads/Archive/flair.quantify.counts_rdered.tsv... 11:13 - Experiment type: Single Series Time-Course 11:13 - Matrix has 1 experimental groups, 2 time events, and a total of 18 samples. 11:13 - Sample column mapping: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 11:14 - [WARN] Unable to find annotation entry for 74181 transcript(s). processExpMatrixFile. 11:14 - All transcripts missing annotations WILL BE IGNORED. 11:14 - Expression matrix transcripts missing annotation: ENST00000000233.10_TALONG000086770, ENST00000000412.8_TALONG000112757, ENST00000001008.6_ENSG00000004478.8, ENST00000002165.11-1_ENSG00000034693.15, ENST00000002165.11_ENSG00000034693.15, ENST00000002501.11_ENSG00000003249.15, ENST00000002596.6_TALONG000124139, ENST00000003100.13_TALONG000082353, ENST00000003583.12_ENSG00000001460.18, ENST00000004103.8_ENSG00000002933.9, ENST00000004982.6_ENSG00000004776.13, ENST00000005082.13_ENSG00000005801.18, ENST00000005178.6_TALONG000082472, ENST00000005257.7_TALONG000078657, ENST00000005259.9_TALONG000086413, ENST00000005340.10_ENSG00000004975.12, ENST00000005374.10_ENSG00000006625.18, ENST00000005386.8_ENSG00000005175.10, ENST00000005558.8_TALONG000086550, ENST00000006053.7_ENSG00000006210.7, ENST00000006275.8_ENSG00000007255.10, ENST00000006658.11_ENSG00000006282.21, ENST00000006777.11_TALONG000080994, ENST00000007264.7_ENSG00000007376.8ENST00000007414.8_ENSG00000006025.12 Only showing the first 25 transcripts. 11:14 - Expression matrix file processing completed. 11:14 - Writing expression matrix factors file to /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/exp_factors.txt 11:14 - Generated expression matrix factors file in 0 ms 11:14 - Writing time course factors file to /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/Data/time_factors.txt 11:14 - Generated time factors file in 0 ms 11:14 - Processing transcripts... 11:14 - Matrix script: /var/folders/1s/272sqggx165c1hm6zhdq5j5r0000gn/T/tappas8652914611628248001.R 11:14 - Rscript: /usr/local/bin/Rscript 11:14 - Running ExpMatrix filter/normalization script: [/usr/local/bin/Rscript, /var/folders/1s/272sqggx165c1hm6zhdq5j5r0000gn/T/tappas8652914611628248001.R, /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/input_matrix.tsv, /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/exp_factors.txt, /Users/rugile/tappasWorkspace/References/UserDefined.01487230048.tappas/transcript_lengths.tsv, /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/input_normalized_matrix.tsv, 0.0, 0.0, N] 11:14 - Input matrix Filter/Normalization process started, process id: @. 11:14 - Input matrix Filter/Normalization process is running... 11:14 - Ikeliamas reikalingas paketas: Biobase 11:14 - Ikeliamas reikalingas paketas: BiocGenerics 11:14 - 11:14 - Pridedamas paketas: 'BiocGenerics' 11:14 - 11:14 - Sie objektai yra uzmaskuoti nuo 'package:stats': 11:14 - 11:14 - IQR, mad, sd, var, xtabs 11:14 - 11:14 - Sie objektai yra uzmaskuoti nuo 'package:base': 11:14 - 11:14 - Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append, 11:14 - as.data.frame, basename, cbind, colnames, dirname, do.call, 11:14 - duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, 11:14 - lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin, 11:14 - pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, 11:14 - tapply, union, unique, unsplit, which.max, which.min 11:14 - 11:14 - Welcome to Bioconductor 11:14 - 11:14 - Vignettes contain introductory material; view with 11:14 - 'browseVignettes()'. To cite Bioconductor, see 11:14 - 'citation("Biobase")', and for packages 'citation("pkgname")'. 11:14 - 11:14 - Ikeliamas reikalingas paketas: splines 11:14 - Ikeliamas reikalingas paketas: Matrix 11:14 - Expression Matrix Filtering and Normalization script arguments: /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/input_matrix.tsv /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/exp_factors.txt /Users/rugile/tappasWorkspace/References/UserDefined.01487230048.tappas/transcript_lengths.tsv /Users/rugile/tappasWorkspace/Projects/Project.01998389807.tappas/InputData/input_normalized_matrix.tsv 0.0 0.0 N 11:14 - 11:14 - Reading input matrix file data... 11:14 - Read 0 transcripts expression data rows 11:14 - Reading factors file data... 11:14 - Reading transcript length file data... 11:14 - Saving results to file... 11:14 - All done. 11:14 - Input matrix Filter/Normalization script ended. Exit value: 0 11:14 - Creating expression matrix file. 11:14 - [ERROR] Invalid line, 2, found in expression matrix file. 11:14 - [ERROR] Unable to write project data results to file. 11:14 - Closing project 'trial'

— Reply to this email directly, view it on GitHub https://github.com/ConesaLab/tappAS/issues/48#issuecomment-1508805547, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVOPCZYF26SUUC2W55RZUZDXBFVD3ANCNFSM6AAAAAAVXZKQOQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Alessandra

rugilemat commented 1 year ago

Hi @almart7,

my initial design had multiple groups across two time points. However, I thought that maybe a complex design was the issue why I couldn't generate the file, so I tried it with fewer samples and simplified design. Again, with no luck.

almart7 commented 1 year ago

Dear @rugilemat,

For tappAS to work, the transcripts ID present in your expression matrix have to match the ones in the annotation gff3 file. You can use isoAnnotLite with your data to produce a valid annotation file. https://isoannot.tappas.org/isoannot-lite/

I hope this is useful for you!

Kind regards,

Alessandra

El jue, 18 may 2023 a las 12:12, rugilemat @.***>) escribió:

Hi @almart7 https://github.com/almart7,

my initial design had multiple groups across two time points. However, I thought that maybe a complex design was the issue why I couldn't generate the file, so I tried it with fewer samples and simplified design. Again, with no luck.

— Reply to this email directly, view it on GitHub https://github.com/ConesaLab/tappAS/issues/48#issuecomment-1552837373, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVOPCZ2Z56YJBQO4HCKJ6ULXGXYZTANCNFSM6AAAAAAVXZKQOQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Alessandra

rugilemat commented 1 year ago

Hi @almart7,

I have used isoannot to generate the gff3. My initial one came from SQANTI3 but I then ran isoannot to get one from the SQANTI3 output.

almart7 commented 1 year ago

Hello @rugilemat

The problem is that the transcript IDs from the gff3 should be the same that the ones from the expression matrix, maybe you can try to modify the expression matrix IDs by removing the second part of the ID (ENST00000361298.9_TALONG000081983) and keep only the transcript ID (ENST00000361298.9).

I hope this helps!

El jue, 18 may 2023 a las 17:58, rugilemat @.***>) escribió:

Hi @almart7 https://github.com/almart7,

I have used isoannot to generate the gff3. My initial one came from SQANTI3 but I then ran isoannot to get one from the SQANTI3 output.

— Reply to this email directly, view it on GitHub https://github.com/ConesaLab/tappAS/issues/48#issuecomment-1553271340, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVOPCZ5BTEWN6C7ENKHAYSDXGZBLFANCNFSM6AAAAAAVXZKQOQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Alessandra