MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

Delimiters csv file #328

Closed vicgarcas closed 8 months ago

vicgarcas commented 9 months ago

No error, but wanted to share something that I've encountered which, although super simple, delayed me a bit. In the files samples.csv and features.csv, if your csv file has ";" as delimiter instead of ",", which my text editor changes automatically after editing the csv file, tinyRNA doesn't work - it gives an error of either your file being empty (no columns found) or of using an old file which is outdated. If you can reproduce this error you may consider adding a note to the documentation - it may be super basic but maybe somebody with very little coding skills (like me) may also encounter the same problem. Thanks!

Victoria

vicgarcas commented 9 months ago

Hi again! I've encountered an additional issue related to the delimiters. In features.csv, if I want to use 5' anchored, I cannot use "5' anchored, 0, 4" because the "," are not accepted. I tried using "5' anchored 0 4" and it gave an error, but I also got an error using "5' anchored". It did work with just "anchored". Do you know what I could be doing wrong? Thanks again!

AlexTate commented 9 months ago

Hi @vicgarcas, thank you for bringing this up, and thank you for including your troubleshooting steps. CSV stands for Comma-Separated Values, but as you've discovered this name refers to a large family of formats that can differ in many ways. Python's CSV documentation describes the situation well:

CSV format was used for many years prior to attempts to describe the format in a standardized way in RFC 4180. The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications.

Issue 1

After looking into this, I've learned that it is common for CSV editors to use ";" delimiters in locales that use "," as a decimal separator (though there are exceptions to this rule). I'll see about adding some heuristics to tinyRNA's CSV reader to make it more flexible for these cases. In the meantime, if you tell me the name of the CSV editor you are using I'll see if it can be configured to save files in a format that is compatible with tinyRNA. I recommend this over using a text editor as a workaround.

Issue 2

The overlap definition 5' anchored, 0, 4 is perfectly valid. If you make this change using a CSV editor, the definition should be written without surrounding double quotes. If you make this change using a text editor and your delimiter is a comma, you will need to write it with surrounding double quotes like "5' anchored, 0, 4" because the field contains the delimiter character. The same would also be true if your delimiter was ";" and the field contained that character. These formatting details are normally handled automatically by the CSV editor and are hidden from the user. Commas and semicolons are allowed in CSV fields, but the underlying text in the CSV file must be formatted properly.

Edit Jan 24: clarified that when using a text editor, fields need to be quoted only if they contain the delimiter character.

vicgarcas commented 8 months ago

Hi Alex, thank you for the reply! I was using either Excel or Numbers - probably neither of them ideal but didn't know what to use. About issue 2, I'll try what you recommend and let you know. Thanks again!

AlexTate commented 8 months ago

@vicgarcas no problem! I've opened a pull request for changes that allow ";" delimiters in CSV files. Once those changes are reviewed and merged by Tai, they'll be available on the master branch. At that point you'll be able to install them by following the steps in the Development Releases installation section.

If you encounter this issue again with other projects, you can change the settings on your computer to make Excel read and write CSVs with comma delimiters. There are two places you can do this and I believe either will work.

Option 1: Excel settings:

  1. From the Excel menu in the upper left corner of the screen, click Preferences...
  2. In the Authoring section click Edit
  3. In the Edit Options section uncheck Use System Separators and set the Decimal Separator to a period character
  4. Change these settings back once you've finished your edits

Option 2: System settings:

  1. From the Apple menu in the upper left corner of the screen, click System Settings...
  2. Click General
  3. Click Language & Region
  4. In the Number Format setting, select the format that shows a period as the decimal separator
  5. Change these settings back once you've finished your edits