Closed alaric-dotmesh closed 4 years ago
Gotcha:
time="2019-10-14T09:19:09Z" level=info msg="<transition> 66429de2-487e-4df3-839e-3e1c7a1f7a66 to active waiting (from active waiting, 0.83s ago)"
time="2019-10-14T09:19:09Z" level=info msg="entering active state for 66429de2-487e-4df3-839e-3e1c7a1f7a66"
time="2019-10-14T09:19:09Z" level=debug msg="[activeState] moving to next state" from=fileOutputIO
time="2019-10-14T09:19:09Z" level=info msg="<transition> 66429de2-487e-4df3-839e-3e1c7a1f7a66 to backoff pausing (from active waiting, 0.00s ago)"
2019/10/14 09:19:09 entering backoff state for 66429de2-487e-4df3-839e-3e1c7a1f7a66
Further logging reveals the next stage in the problem:
time="2019-10-14T15:04:40Z" level=error msg="[readFile] Error statting" error="stat /var/lib/dotmesh/mnt/dmfs/66429de2-487e-4df3-839e-3e1c7a1f7a66@3965e857-246c-4e8a-931a-6860000dd80a/__default__/audio-tag/notebooks/CNN.ipynb: no such file or directory" filename=audio-tag/notebooks/CNN.ipynb sourcePath=/var/lib/dotmesh/mnt/dmfs/66429de2-487e-4df3-839e-3e1c7a1f7a66@3965e857-246c-4e8a-931a-6860000dd80a/__default__/audio-tag/notebooks/CNN.ipynb
The directory is there, but no CNN.ipynb
inside it.
...is the problem that we're comparing the snapshot that CREATED CNN.ipynb
to the snapshot before, when it didn't exist?
It seems so! There's two snapshots mounted:
CNN.ipynb
CNN.ipynb
The diff URL that's showing the 500 error is:
https://cloud.dotscience.com/diff? base=%2Fv2%2Fprojects%2F229bf822-d3cb-403e-af6d-fde9a5c07218%2Ffiles%2Fsnapshot%2F3965e857-246c-4e8a-931a-6860000dd80a%2Faudio-tag%252Fnotebooks%252FCNN.ipynb%3FshortLivedToken%3DeyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2NvdW50X2lkIjoiOGE5ODA4MjAtOTM5MS00NTdlLTkyMDktZjczZTRjOGU5MjM3IiwiZW1haWwiOiJrc2hpdGl6Lm1hbmRhbEBmdXNlbWFjaGluZXMuY29tIiwiZXhwIjoxNTcxMTUxODc3LCJpYXQiOjE1NzEwNjU0NzcsInN1YiI6IjhhOTgwODIwLTkzOTEtNDU3ZS05MjA5LWY3M2U0YzhlOTIzNyIsInVzZXJuYW1lIjoia3NoaXRpem1hbmRhbCJ9._fiA2Ym4bxd5OtC0Iy5WFj36tHWZPtD3kjzJnaI_61MRTx0q1BhQIQLSNNXA95vJ36qALjMGFFwxN8c47ASJyg &remote=%2Fv2%2Fprojects%2F229bf822-d3cb-403e-af6d-fde9a5c07218%2Ffiles%2Fsnapshot%2F6ef5e876-bddf-44ee-bbb7-7ed3cbfeae78%2Faudio-tag%252Fnotebooks%252FCNN.ipynb%3FshortLivedToken%3DeyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2NvdW50X2lkIjoiOGE5ODA4MjAtOTM5MS00NTdlLTkyMDktZjczZTRjOGU5MjM3IiwiZW1haWwiOiJrc2hpdGl6Lm1hbmRhbEBmdXNlbWFjaGluZXMuY29tIiwiZXhwIjoxNTcxMTUxODc3LCJpYXQiOjE1NzEwNjU0NzcsInN1YiI6IjhhOTgwODIwLTkzOTEtNDU3ZS05MjA5LWY3M2U0YzhlOTIzNyIsInVzZXJuYW1lIjoia3NoaXRpem1hbmRhbCJ9._fiA2Ym4bxd5OtC0Iy5WFj36tHWZPtD3kjzJnaI_61MRTx0q1BhQIQLSNNXA95vJ36qALjMGFFwxN8c47ASJyg
...and those are the snapshot IDs embedded in that URL.
Suggestions:
We probably don't need to go into backoff state when a file is not found in this case, but it's probably not doing any harm (now we have readFile logging). I'll remove the moving to next state
logging but propose the readFile
logging for master, and add the same sort of logging in saveFile
.
We also need to return a useful 404 error when we request a nonexistant file via S3, to help spot this kind of problem faster in future. I'll see if I can do that too.
We need to think about the desired UX for looking at a diff that creates (or deletes?) a file and implement that!
Product feedback on point 3: https://dotscience.slack.com/archives/C7PHSRL82/p1571070744378500
PR for point 1: https://github.com/dotmesh-io/dotmesh/pull/758
PR for point 2: https://github.com/dotmesh-io/dotmesh/pull/759
Point 3 is to be resolved by:
FuseMachines are reporting multiple issues with the notebook diff view breaking, eg https://cloud.dotscience.com/project/229bf822-d3cb-403e-af6d-fde9a5c07218/runs/file/audio-tag%252Fnotebooks%252FCNN.ipynb/cf2d5227-61a0-4e51-85c3-cfd30f867e12?page=0
Seems to be a problem with the S3 interface used to fetch the notebook files; the FSMs are going into the backoff state and we're not sure why.