aidenlab / juicer

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
http://aidenlab.org
MIT License
410 stars 181 forks source link

unix group names with spaces break ls -l | awk in juicer.sh, cleanup.sh, check.sh #216

Closed eernst closed 2 years ago

eernst commented 3 years ago

Are you sure this is a bug? If unsure, please post your question/situation to the forum first: aidenlab.org/forum.html

Describe the bug Unix group names can have spaces, such as "aiden lab", as opposed to the more common "aidenlab". Several scripts parse ls -l output using awk with assumptions about the column number of the space-delimited output which do not hold when the file is owned by a group with a space in the name.

E.g. from scripts/cleanup.sh:

total=`ls -l aligned/merged_sort.txt | awk '{print $5}'`

total will contain "lab" rather than the file size if the group name is "aiden lab".

This occurs in several places:

$ grep '[^a-zA-Z]ls .*awk' scripts/*
scripts/check.sh:total=`ls -l ${outputdir}/merged_sort.txt | awk '{print $5}'`
scripts/check.sh:total2=`ls -l ${outputdir}/merged_nodups.txt ${outputdir}/dups.txt ${outputdir}/opt_dups.txt | awk '{sum = sum + $5}END{print sum}'`
scripts/cleanup.sh:total=`ls -l aligned/merged_sort.txt | awk '{print $5}'`
scripts/cleanup.sh:total2=`ls -l aligned/merged_nodups.txt aligned/dups.txt aligned/opt_dups.txt | awk '{sum = sum + $5}END{print sum}'`
scripts/cleanup.sh:    testname=$(ls -l fastq | awk 'NR==1{print $9}')
scripts/juicer.sh:    fastqsize=`ls -lL ${fastqdir} | awk '{sum+=$5}END{print sum}'`
scripts/juicer.sh:        testname=$(ls -l ${splitdir} | awk '$9~/fastq$/||$9~/gz$/{print $9; exit}'

To Reproduce Steps to reproduce the behavior:

  1. Run juicer as a user with a primary group that has a space in the name.
  2. cleanup.sh will fail (as will other scripts under certain conditions).

Expected behavior Scripts should work with group names (or user names) that contain spaces, as these are allowed and used in practice, for example, with institutional authentication via Active Directory.

Adding -g and -G to the ls calls will suppress user and group names in output. Decrementing awk column references by 2 should then resolve the issue.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.