TaliaferroLab / LABRAT

Lightweight Alignment Based Resolution of Alternative Three Prime Ends
MIT License
8 stars 4 forks source link

Maketffasta yielding small file #6

Closed rhea9184 closed 1 year ago

rhea9184 commented 1 year ago

Hi,

Thank you so much for having very clear instructions for everything on the github. I am trying to run LABRATsc, and I was able to run make TFfasta (it took a few hours as mentioned), however the output TFseqs.fasta I got was only 18.4MB. I am not sure if this is correct. Here is a screenshot of what the screen looked while it ran and finished up:

Screen Shot 2023-02-05 at 9 53 44

I tried running the salmon alevin command after this and the error message I got was:

Version Info: Could not resolve upgrade information in the alotted time. Check for upgrades manually at https://combine-lab.github.io/salmon Logs will be written to /.mounts/labs/steinlab/private/rhea/neuro/alevin_output/logs [2023-02-05 10:16:38.304] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored. [2023-02-05 10:16:38.304] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65 [2023-02-05 10:16:38.304] [jointLog] [info] Usage of --validateMappings, without --hardFilter implies use of range factorization. rangeFactorizationBins is being set to 4 [2023-02-05 10:16:38.304] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consensusSlack to 0.2. [2023-02-05 10:16:38.304] [jointLog] [info] Using default value of 0.87 for minScoreFraction in Alevin Using default value of 0.6 for consensusSlack in Alevin [2023-02-05 10:16:38.304] [alevinLog] [error] Index Directory or the txpInfo.bin: /.mounts/labs/steinlab/private/rhea/neuro/tffasta/TFseqs.fasta/txpInfo.bin doesn't exist.

Thank you in advance for your help, Rhea

taliaferrojm commented 1 year ago

Hi Rhea,

The fasta file produced should contain the last 300 nt of every transcript. You can verify this by looking at it, but what you describe sounds about right. I think the problem you are describing is an alevin problem, not a LABRAT problem.

After making that fasta file, you use it to make an index using salmon index. I can't tell for sure, but it looks like you may have skipped that step. See here. You would provide the fasta file you made with the -t option.

Happy to help more if needed.

rhea9184 commented 1 year ago

Oh I think I absolutely did skip the salmon index step, thank you so much! I will try it out. Thank you so much, I will reach out I need more help.

Thank you once again!

rhea9184 commented 1 year ago

Hi Matthew,

You were right, I was able to fix the issue with running salmon index. However, now I have an issue running alevin. It seems to be with the tgMap file. Based on the LABRATsc README, my txp2gene file is just the transcripts listed twice in both columns (not sure if I misinterpreted the instructions). The error I am getting is:

Version Info: Could not resolve upgrade information in the alotted time. Check for upgrades manually at https://combine-lab.github.io/salmon Logs will be written to /.mounts/labs/steinlab/private/rhea/neuro/alevin_output/logs [2023-02-06 18:43:48.862] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored. [2023-02-06 18:43:48.862] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65 [2023-02-06 18:43:48.862] [jointLog] [info] Usage of --validateMappings, without --hardFilter implies use of range factorization. rangeFactorizationBins is being set to 4 [2023-02-06 18:43:48.862] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consensusSlack to 0.2. [2023-02-06 18:43:48.862] [jointLog] [info] Using default value of 0.87 for minScoreFraction in Alevin Using default value of 0.6 for consensusSlack in Alevin [2023-02-06 18:43:48.863] [alevinLog] [info] Loading Header [2023-02-06 18:43:48.864] [alevinLog] [info] Loading Transcript Info [2023-02-06 18:43:49.073] [alevinLog] [error] ERROR: Txp to Gene Map not found for 49410 transcripts. Exiting

It seems alevin is having a hard time mapping the transcripts to the genes because they are listed twice. I am not sure how to bypass that for LABRATsc. All the alevin troubleshooting I have found thus far is catered toward transcripts and genes, but it sounds like LABRATsc is trying to bypass it by preventing alevin from collapsing multiple transcripts to a gene? (again, I might be wrong).

Once again thank you so much for your help!

taliaferrojm commented 1 year ago

This may have to do with how you formatted the txp2gene file. It should just be a two-column, tab separated text file. Every transcript represented in your TFfasta should also be represented by a single line in this file.

rhea9184 commented 1 year ago

Thank you, I will try again. I think I was missing the second half of that. Thanks!