PeanutBase / PeanutbaseWebsite

Repo to document and track issues pertaining to PeanutBase website.
0 stars 0 forks source link

Expression:Legacy expression atlas download tables have outdated links to GBrowse and Tripal/Chado features #24

Open sdash-github opened 1 year ago

sdash-github commented 1 year ago

Origin, @adf-ncgr observed: Email. Andrew Farmer adf@ncgr.org Subject: | Re: Report on peanut database Tue, 29 Aug 2023 08:45:31 -0600 Sudhansu Dash (NCGR) sdash@ncgr.org

it was the links to html pages served from /falafel; for some reason the filesystem mount didn't come back on the restart, although the /etc/fstab entry that should have controlled it looks OK to me. I did notice that on the pages like: https://www.peanutbase.org/files/expression/arahy_atlas_html/LateralStem_Leaves-MainStem_Leaves_up.html the links to the GBrowse are all broken since they haven't been updated to point to legacy.peanutbase.org, maybe you could take care of it? adf

SDash TO DO:

sdash-github commented 1 year ago

SDash TO DO:

sdash-github commented 1 year ago

Test after duplicating the files/expression/?? dirs Something like find . -type f -name "*.txt" -exec sed -i'' -e 's/foo/bar/g' {} + or sed -i 's/foo/bar/g' $(find . -type f)

https://superuser.com/questions/428493/how-can-i-do-a-recursive-find-and-replace-from-the-command-line

sdash-github commented 1 year ago

Copied to (cp -p -r files/expression sd-work/) /falafel/peanutbase/sd-work/expression. 12 files 1985 files. Not all of them are html files that need the url update.

sdash-github commented 1 year ago

Diagnostic exploration:

find . -type f -name "*.html" -exec grep -e "gbrowse_peanut1" {} \; | less -N
find . -type f -name "*.html" -exec grep -e "gbrowse_peanut1" {} \; | wc -l
# 6600746 lines need replacement
find . -type f -name "*.html" -exec grep -F -o  "href=\"/gbrowse_peanut1.0?query=name=" {} \; | less -N
# 6600746
find . -type f -name "*.html" -exec grep -l  "href=\"/gbrowse_peanut1.0?query=name=" {} \; | less -N    # 924
find . -type f -name "*.html" -exec grep -l  "href=\"/gbrowse_peanut1.0?query=name=" {} \; | wc -l
# 924 files have theses matching URL parts
find . -type f -name "*.html" -exec grep -l  "href=\"/gbrowse_peanut1.0?query=name=" {} \; > htmlFilesWithGbrowseUrl.list.txt
cat htmlFilesWithGbrowseUrl.list.txt | grep arahy_atlas_html | wc -l
# 924 #all of them are in dir ./expression/arahy_atlas_html
tree  expression/arahy_atlas_html/ | less
#0 directories, 924 files #confirming this is where I need to change/update the URLs
find ./expression/arahy_atlas_html -type f -name "*.html"  | wc -l      #924 

find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  grep -l  "href=\"/gbrowse_peanut1.0?query=name=" {} \; | wc -l
924

Conclusion:

Only 924(all) html files in ./expression/arahy_atlas_html need to be updated.

sdash-github commented 1 year ago

Substitution and test done at dir falafel/peanutbase/sd-work/expression

##Now try substitution via sed:
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  sed -n '/href=\"\/gbrowse_peanut1.0?query=name=/p' {} \; | less -N
#6600746  works
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  sed -n '/href=\"\/gbrowse_peanut1.0?query=name=/p' {} \; | wc -l
# 6600746 => This find and exec can work for sed substitution.
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  sed  -n 's|href=\"\/gbrowse_peanut1.0?query=name=|href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=|pg' {} \; | grep "href=\"/gbrowse_peanut1.0?query=name=" | less -N  ## Empty output, as expected after substitution
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  sed  -n 's|href=\"\/gbrowse_peanut1.0?query=name=|href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=|pg' {} \; | grep "href=\"/gbrowse_peanut1.0?query=name=" | wc -l  ## Empty output, as expected after substitution
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  sed  -n 's|href=\"\/gbrowse_peanut1.0?query=name=|href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=|pg' {} \; | grep "href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=" | less -N
# 6600746 => The substitution worked. Checked the first few lines.
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  sed  -n 's|href=\"\/gbrowse_peanut1.0?query=name=|href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=|pg' {} \; | grep "href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=" | wc -l
ccc  # 2nd verification with line count
# The modified URLs after substitution worked, tested a few.
#  
# Substitution in place: 
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  sed -i  -n 's|href=\"\/gbrowse_peanut1.0?query=name=|href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=|pg' {} \;
#  
## Now test after substitution done in place:
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  grep -l  "href=\".*legacy.*/gbrowse_peanut1.0?query=name=" {} \; | wc -l
#924 #same num of files before after substitution
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  grep -l  "href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=" {} \; | wc -l
#924 #same confirmed with more complete part of URL
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  grep   "href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=" {} \; | wc -l
ccc  #6600746  #CONFIRMED SUBSTITUTION WORKED: Exact number of lines before and after substitution

TO DO: Now Replace old expression dir with new at sdash-work/expression/

sdash-github commented 1 year ago

Now Replace old expression dir with new at sdash-work/expression/

Problem: Substituted files missing <h2> and <table> tags and hence pages fail to display content in tabular form.

#Now Replace old expression dir with new at sdash-work/expression/
cp -p -r expression ../files/
#Copied from falafel peanutbase/sdash-work/ after at peanutbase/files/ mv expression expression-old
#Website check:
# The pages look without any white space (No gaps between columns). WHY??
# No page heading nor table header row.
diff expression/arahy_atlas_html/LateralStem_Leaves-VegetativeShootTip_up.html   \
../files/expression/arahy_atlas_html/LateralStem_Leaves-VegetativeShootTip_up.html
# diff cmd finds no differences => my file substn was okay; but sth else happening ccc
# The unchanged file at expression-old contains <h2>xxxx</h2> and <table>...</table> tags. \ 
The substituted file missing these.
sdash-github commented 1 year ago

Ask @adf-ncgr My find -exec sed substitution resulted in disappearance of <h2> <table> opening and closing tags along with their inner html. All the data within <tr> and <td> tags are there, no problem there. Only the <table> and </table> tags disappered from the beginning and end of file.

My question: Any obvious defect in the cmd below?

# Substitution in place: 
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  sed -i  -n 's|href=\"\/gbrowse_peanut1.0?query=name=|href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=|pg' {} \;
adf-ncgr commented 1 year ago

I think what's happening is that non-matching lines are getting suppressed. I would do it without using -n and without using the p command after your substitute. That should have the effect of printing all lines regardless of whether the substitution has been applied or not, I think.

sdash-github commented 1 year ago

That worked. Thanks.

## Need to do sed substitution again without -n and -p
#rm -r -f expression/*
# Removed ../sd-work/expression
rmdir expression/
cp -p -r ../files/expression  .  # Copying a fresh copy of dir expression to sd-work for testing
find ./expression/arahy_atlas_html -type f -name "*.html"  -exec  sed -i  's|href=\"\/gbrowse_peanut1.0?query=name=|href=\"https:\/\/legacy.peanutbase.org/gbrowse_peanut1.0?query=name=|g' {} \;
#sed in place substitution without '-n' and '/p' in sd-work test dir
#One file checked and has necessary table and h2 tags.
##mv ../files/expression  ../files/expression-old-copy
mv expression ../files/
#Checked page at https://dev.peanutbase.org/expression/expr_tissue_Hyp.html. Is well formed and links work(go to legacy site GBrowse
## DONE: This task done.
sdash-github commented 1 year ago

Other files also need similar link updates

In ..peanutbase/files/expression/ dir represented in webpage https://dev.peanutbase.org/expression/
These are I think mostly non hypogaea data pages.