C2DH / jdh-notebook

A collection of Jupyter notebooks for the Journal of Digital History
https://journalofdigitalhistory.org
GNU Affero General Public License v3.0
4 stars 1 forks source link

Technical review Tracking and tracing audiovisual reuse: Introducing the Video Reuse Detector #103

Closed eliselavy closed 5 months ago

eliselavy commented 1 year ago

Question about the use of Ffmpeg

https://github.com/jdh-observer/BuWvtJFxh3wy

eliselavy commented 1 year ago

Problem in preview: https://journalofdigitalhistory.org/en/notebook-viewer/JTJGcHJveHktZ2l0aHVidXNlcmNvbnRlbnQlMkZUb21hc1Nrb3RhcmUlMkZ0ZXN0X3ZyZF9yZXBvJTJGbWFpbiUyRlRyYWNraW5nJTI1MjAtJTI1MjBNYXJpYSUyNTIwZWRpdGlvbiUyNTIwdy4lMjUyMFRvbWFzJTI1MjBGaXhlcy1WMTAuaXB5bmI=

Raw url not available with Git LFS. To have the raw url:

https://raw.githubusercontent.com/<username>/<repository>/<branch>/<path>/<filename>?token=<LFS_token>

To get your Git LFS token, go to the "Settings" tab of your GitHub repository, and click on "Secrets" in the left sidebar. Then, create a new secret with the name "GIT_LFS_TOKEN" and set its value to your Git LFS token. After you have modified the URL with your Git LFS token, you should be able to access the raw file through Git LFS. @mariacheriksson is it work? Wait for author feedback, I will look at also to :https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md

eliselavy commented 1 year ago

About the images, time to implement the gallery tag https://github.com/C2DH/journal-of-digital-history/issues/98

Screenshot 2023-04-04 at 12 14 13

Need to speak with @danieleguido at the JDH's meeting

mariacheriksson commented 1 year ago

Question about the use of Ffmpeg

https://github.com/jdh-observer/BuWvtJFxh3wy

Hi Elisabeth! This link seems to be broken - maybe try adding it again?

mariacheriksson commented 1 year ago

Another question that we have concerns adding videos to the notebook. We'd like to add two clips and these files are stored in the folder media/demo in our github repo. We tried using the code that can be found in the journal's author guidelines, but can't seem to get it to work. Any ideas why (see copy of code below)? Also, the captions/metadata is rendered twice when we use the code below. Possible bug?

from IPython.display import Video, display
metadata={
    "jdh": {
        "module": "object",
        "object": {
            "type":"image",
            "source": [
                "The Eagle Has Landed: The Flight of Apollo 11 (1969)"
            ]
        }
    }
}
display(Video("media/demo/The Eagle Has Landed x265.mp4"), metadata=metadata)

@mariacheriksson it's appear two times because it 's included two times

It means:

Screenshot 2023-04-04 at 14 57 05

you can insert the caption by using the metadata you can insert the caption in the code cell => you choose one way By the way in the GitHub LFS you don't have this, because you included only in the code

@mariacheriksson about the video , in the guidelines we provided a code snippet to display Vimeo video (see here:https://journalofdigitalhistory.org/en/article/33pRxE2dtUHP?idx=139 or in the guidelines https://journalofdigitalhistory.org/en/guidelines?idx=252&layer=narrative&lh=643&pidx=252&pl=narrative&y=93.5) We encourage this. Like that the full video is not rendered in the ouput cell. And you can also save space in your GitHub repository as the two videos are > 40MB. For archive, we usually transfer on the Vimeo account of JDH after review.

@mariacheriksson sorry Maria din't see it was a video from internet archive , you can use the Iframe as below:

from IPython.display import IFrame
IFrame('https://archive.org/details/journey-through-the-solar-system-episode-06-the-moon-us', width=800, height=600)
Screenshot 2023-04-04 at 15 45 29
mariacheriksson commented 1 year ago

Note that the metadata/captions are also rendered twice when we display ordinary photographies, using this code:

from IPython.display import Image, display

metadata={ "jdh": { "module": "object", "object": { "type":"image", "source": ["Suggestion of where to set the threshold in the \ analysis of a distance metric histogram." ] } } }

display(Image("./media/demo/distance_illustration.jpg"), metadata=metadata) @mariacheriksson it's appear two times because it 's included two times It means:

mariacheriksson commented 1 year ago

We also wanted to ask if the journal has a Pandas style, so that we can make our tables and data output look a bit nicer. If not, would it be possible to get access to the color code for the journal's hermeneutics layer?

mariacheriksson commented 1 year ago

About the images, time to implement the gallery tag C2DH/journal-of-digital-history#98 Screenshot 2023-04-04 at 12 14 13 Need to speak with @danieleguido at the JDH's meeting

Adding a gallery feature sounds like a nice idea, but I'm afraid it probably won't help us in our article. What we are asking for is really a feature for collapsing/shrinking cell outputs and making them scrollable (up and down). If you open our notebook in a regular browser (without looking at it through the journal's preview system) you will see that this is used in every code cell in our hermeneutics layer. Would something like this be possible?

mariacheriksson commented 1 year ago

One final question: we've followed the author guidelines but can't get the "narrative-step" and "hermeneutics-step" functions to work (nothing seems to happen when we tag the cells according to the quotations above). Is this function supposed to work, and if so do you have any idea what we are doing wrong?

@mariacheriksson narrative-step and hermeneutics-step are for the moment no more supported. The guidelines should have been updated... Theses tags are no more relevent:

Screenshot 2023-04-04 at 14 25 29
TomasSkotare commented 1 year ago

Problem in preview: https://journalofdigitalhistory.org/en/notebook-viewer/JTJGcHJveHktZ2l0aHVidXNlcmNvbnRlbnQlMkZUb21hc1Nrb3RhcmUlMkZ0ZXN0X3ZyZF9yZXBvJTJGbWFpbiUyRlRyYWNraW5nJTI1MjAtJTI1MjBNYXJpYSUyNTIwZWRpdGlvbiUyNTIwdy4lMjUyMFRvbWFzJTI1MjBGaXhlcy1WMTAuaXB5bmI=

Raw url not available with Git LFS. To have the raw url:

https://raw.githubusercontent.com/<username>/<repository>/<branch>/<path>/<filename>?token=<LFS_token>

To get your Git LFS token, go to the "Settings" tab of your GitHub repository, and click on "Secrets" in the left sidebar. Then, create a new secret with the name "GIT_LFS_TOKEN" and set its value to your Git LFS token. After you have modified the URL with your Git LFS token, you should be able to access the raw file through Git LFS. @mariacheriksson is it work? Wait for author feedback, I will look at also to :https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md

Hello!

I tried to get this to work but I couldn't pass the token step, as I was unsure where there secret should be saved, as well as the value of the secret. Is it the token to the repo? Because it's public, so that shouldn't strictly be necessary - but perhaps I misunderstand!

Regardless, my best effort to find a direct download link resulted in: https://media.githubusercontent.com/media/TomasSkotare/test_vrd_repo/main/Tracking%20-%20Maria%20edition%20w.%20Tomas%20Fixes-V10.ipynb

Which will result in a download, even using tools e.g. wget.

However, it doesn't work in your viewer, so I'm sure I'm missing something!

If you have any extra information I would be very thankful! @TomasSkotare let me experiment with this as it's also our first article in LFS. I will be back to you.

mariacheriksson commented 1 year ago

Hey @eliselavy! Aside from the unanswered questions above, I noticed that the following prompt doesn't work to make links open in a new window: link{:target="_blank"}. Any idea how this can be fixed?

@mariacheriksson this issue will be available soon in production: https://github.com/C2DH/journal-of-digital-history/issues/538

mariacheriksson commented 1 year ago

Note that the metadata/captions are also rendered twice when we display ordinary photographies, using this code:

from IPython.display import Image, display

metadata={ "jdh": { "module": "object", "object": { "type":"image", "source": ["Suggestion of where to set the threshold in the analysis of a distance metric histogram." ] } } }

display(Image("./media/demo/distance_illustration.jpg"), metadata=metadata) @mariacheriksson it's appear two times because it 's included two times It means:

  • you can insert the caption by using the metadata
  • you can insert the caption in the code cell => you choose one way
Screenshot 2023-04-04 at 14 57 34

Hey! Not sure I understand what you mean... I have copy pasted the exact same text prompt that is shown in the user guidelines (see below). What part of this code should I remove to avoid it being included two times?:

metadata={ "jdh": { "module": "object", "object": { "type":"image", "source": ["Suggestion of where to set the threshold in the analysis of a distance metric histogram." ] } } }

display(Image("./media/demo/distance_illustration.jpg"), metadata=metadata)

eliselavy commented 1 year ago

@mariacheriksson My bad Maria, I looked at it a bit quickly. I thought you also used via the metadata to insert labels. See screenshot below

Screenshot 2023-04-11 at 13 02 54

but in fact it's a bad effect of the fact that the figures have not been tagged as a figure, if you add a tag to specify that it's a figure (for example below: figure-layer-*), the label is not duplicated.

Screenshot 2023-04-11 at 13 02 37 Screenshot 2023-04-11 at 13 02 30

Thank you for your understanding.

mariacheriksson commented 1 year ago

@mariacheriksson My bad Maria, I looked at it a bit quickly. I thought you also used via the metadata to insert labels. See screenshot below Screenshot 2023-04-11 at 13 02 54

but in fact it's a bad effect of the fact that the figures have not been tagged as a figure, if you add a tag to specify that it's a figure (for example below: figure-layer-*), the label is not duplicated. Screenshot 2023-04-11 at 13 02 37 Screenshot 2023-04-11 at 13 02 30

Thank you for your understanding.

Ah - fabulous! Thank you @eliselavy. I had forgotten to add the proper tags :) now it works!

mariacheriksson commented 1 year ago

@eliselavy @TomasSkotare

So, just to recap, here are our remaining questions:

  1. We are not sure if there's a better way of installing ffmpg than our current solution in the notebook. The link you added above (https://github.com/jdh-observer/BuWvtJFxh3wy) is unfortunately broken. Did you find another solution? Maybe try adding it again?

  2. Does the journal have a Pandas style? We are asking to see if there's a way make our tables and data output look a bit nicer. If not, would it be possible to get the color code for the journal's hermeneutics layer?

  3. As a solution to the poor way in which our data is shown in the hermeneutics layer, you suggested adding a gallery feature. This sounds like a nice idea, but I'm afraid it won't help us in our article. What we are asking for is really a feature for collapsing/shrinking cell outputs and making them scrollable (up and down). If you open our notebook in a regular browser (without looking at it through the journal's preview system) you will see that this is used in every code cell in our hermeneutics layer. Would something like this be possible?

  4. Tomas had some questions regarding Git LFS, but you already announced you will get back to him regarding this!

  5. I'm trying to get the feature for making links open in an external window work, but I'm not succeeding. For instance, I'm following the author guidelines and adding the following in a markdown cell:

RosaMannen{:target="_blank"}

But when this is rendered in the journal's preview tool, a new tab/window does not open when I click on the link. Any idea what I'm doing wrong?

  1. In the narrative layer of our article, we would like to add a series of images that have a low height. We noticed that a figure style of min. 339 pixels height is currently applied in all figures, and this creates a lot of empty space/margins below all our pictures (see screenshot below). Is there a way to remove or circumvent this standardized pixel height, so that we can show our pictures in a more elegant way?

image

Thank you so much for your help!

eliselavy commented 1 year ago

@mariacheriksson

mariacheriksson commented 1 year ago

@eliselavy

Great. Thank you so much for your replies above. Will get back to you if there's anything that still doesn't work.

Regarding question 3 and the possibility to collapse cell outputs: we have no problem of doing this when we read our article in through an ordinary jupyter notebook url. The problem arises when our article is rendered in the journal. For example, if you go to the subchapter "Step 6. Output matching results" in our article, and have a look at the cell outputs, you will see that these are "collapsed" into a scrollable section when viewed in jupyter notebook. When you preview the article through the journal's webpage, however, the cell outputs are no longer collapsed and instead, each output (matched sequence of frames) is shown one after the other. Since we are sometimes outputting 200 sequences in at once, this means that readers have to scroll through a very, very large amount of data, when they read the article. This is far from ideal, and we would be very happy if it was possible to collapse the outputs in a similar way as in the usual notebook. If this is not possible, we will seriously have to reduce the amount of data shown in our hermeneutics layer and rewrite large parts of the text. Otherwise, the article will simply not be readable.

I'm attaching two screenshots below that show the difference in notebook and journal preview. Do let me know if this is not explained properly!

image

image

eliselavy commented 1 year ago

@mariacheriksson ah ok and by tagging as a figure , you will have the dataframe first and the images after Screenshot 2023-04-13 at 14 15 21

eliselavy commented 1 year ago

Email made to the authors (04/17/2023) about access to Git LFS / problem number of images

Screenshot 2023-04-17 at 11 55 43

You have 11 cells with show_limit=200. Each of the cells mentioned above displays 200 pictures along with 200 tables. So for 11 cells, you will show 2200 images beside 2200 tables which is a large number for a notebook. Also, there are eight other cells with show_limit=100 or 10 or 50. Limiting the number of images? or divide the Notebook into two or three? The model will be saved to retrieve it later in other notebooks to show the results if needed.

mariacheriksson commented 1 year ago

Hi @eliselavy, (@TomasSkotare for info),

Regarding your comments above: that is right, sometimes we output 200 images, sometimss 100, sometimes 50 and it is absolutely correct that this adds up to a lot of pictures. We are working with audiovisual content as source materials and by nature, this involves dealing with lots of images/frames. This is also the reason why it would be fantastic if we could collapse the cell outputs into a scrollable cell (so that readers wouldn't have to scroll down excessive amounts of images when reading the hermeneutical layer of our article). Again, this feature is normally built into Jupyter notebooks, but it disappears in the journal's rendering of the text.

I'm afraid tagging the cell outputs as a figure won't help, since readers will still be forced to scroll through the same amount of information (+ it makes it very difficult to grasp which metadata belongs to which image sequence). You also mentioned using the toggle feature to partially hide the cell outputs, but we are unsure how this feature can be implemented and transported in the journal's rendering of the article. Perhaps you could clarify? I'm also not sure I understand how you mean that dividing the notebook into two or three could help in this case, but maybe you could explain a bit more?

As you suggest, we could limit the number of images but this would greatly reduce and limit our possibilities to showcase our methods (which is a key feature of the special issue on Digital tools that the article would be part of). It would also involve rewriting a very substantial part of the article, and this is why we really want to double-check if it wouldn't be possible to add a feature for collapsing cell outputs. Once more, what we are asking for is exactly the same feature that is automatically shown when you look at our article in a Jupyter Notebook browser window (as opposed to the journal's article preview).

mariacheriksson commented 1 year ago

Hello again @eliselavy,

You wrote the following before: "about the size of the image don't worry now about it, we have a GitHub action we retag all the pictures by adding tag of width/height in order to avoid this blank marge ( i can share it with you for test) and this should be documented to the guidelines"

Would it be possible to have a look at the test? we are fixing up the final things with our pictures and it would be great to double-check how things will look in the final article!

Best, Maria

mariacheriksson commented 1 year ago

Hi @eliselavy,

One final question: we are doing a final proof-reading of the references and can't get the zotero plugin to work as we would like to. For instance, references that have been deleted from the main text remain in the bibliography at the end of the article. When references are updated in the linked zotero-account (for instance, adding a missing publication date) this is also not imported to the article. Furthermore, we were wondering how to add page numbers for citations in the correct way. Thanks for your help!

eliselavy commented 1 year ago

Hi @eliselavy,

One final question: we are doing a final proof-reading of the references and can't get the zotero plugin to work as we would like to. For instance, references that have been deleted from the main text remain in the bibliography at the end of the article. When references are updated in the linked zotero-account (for instance, adding a missing publication date) this is also not imported to the article. Furthermore, we were wondering how to add page numbers for citations in the correct way. Thanks for your help!

Hi @mariacheriksson

eliselavy commented 1 year ago

Hi @eliselavy, (@TomasSkotare for info),

Regarding your comments above: that is right, sometimes we output 200 images, sometimss 100, sometimes 50 and it is absolutely correct that this adds up to a lot of pictures. We are working with audiovisual content as source materials and by nature, this involves dealing with lots of images/frames. This is also the reason why it would be fantastic if we could collapse the cell outputs into a scrollable cell (so that readers wouldn't have to scroll down excessive amounts of images when reading the hermeneutical layer of our article). Again, this feature is normally built into Jupyter notebooks, but it disappears in the journal's rendering of the text.

I'm afraid tagging the cell outputs as a figure won't help, since readers will still be forced to scroll through the same amount of information (+ it makes it very difficult to grasp which metadata belongs to which image sequence). You also mentioned using the toggle feature to partially hide the cell outputs, but we are unsure how this feature can be implemented and transported in the journal's rendering of the article. Perhaps you could clarify? I'm also not sure I understand how you mean that dividing the notebook into two or three could help in this case, but maybe you could explain a bit more?

As you suggest, we could limit the number of images but this would greatly reduce and limit our possibilities to showcase our methods (which is a key feature of the special issue on Digital tools that the article would be part of). It would also involve rewriting a very substantial part of the article, and this is why we really want to double-check if it wouldn't be possible to add a feature for collapsing cell outputs. Once more, what we are asking for is exactly the same feature that is automatically shown when you look at our article in a Jupyter Notebook browser window (as opposed to the journal's article preview).

@mariacheriksson we go to have a JDH 's meeting next week about this. I will come back to you

mariacheriksson commented 1 year ago

Fabulous! Thank you so much for your help @eliselavy!

eliselavy commented 1 year ago

Hello again @eliselavy,

You wrote the following before: "about the size of the image don't worry now about it, we have a GitHub action we retag all the pictures by adding tag of width/height in order to avoid this blank marge ( i can share it with you for test) and this should be documented to the guidelines"

Would it be possible to have a look at the test? we are fixing up the final things with our pictures and it would be great to double-check how things will look in the final article!

Best, Maria

I add the GitHub action in the JDH template repository https://github.com/C2DH/template_repo_JDH/blob/main/.github/workflows/github-actions-publishing.yml, you can copy the code of github-actions-publishing.yml And add this action to your repository:

Screenshot 2023-04-27 at 16 09 05 Screenshot 2023-04-27 at 16 11 51

You need to put the name of your notebook in parameter notebook in the file github-actions-publishing.yml

with:
          notebook: 'article.ipynb'
          output_notebook: 'skim-article.ipynb'

After that commit the file, and run the workflow:

Screenshot 2023-04-27 at 16 16 02

A new file will be produced: skim-article.ipynb where all the figures will be tagged with the width and the height

eliselavy commented 1 year ago

Save output in the GitHub repository to give further access "Give a taste of DH articles"

eliselavy commented 6 months ago

Locally:

Error by installing package: pip install video-reuse-detector

https://pypi.org/project/video-reuse-detector/

----------------------------------------
  ERROR: Failed building wheel for grpcio
  Running setup.py clean for grpcio
Failed to build grpcio
Installing collected packages: tensorflow-estimator, importlib-metadata, markdown, grpcio, tensorboard-data-server, absl-py, tensorboard-plugin-wit, oauthlib, requests-oauthlib, google-auth-oauthlib, tensorboard, libclang, keras, google-pasta, astunparse, termcolor, tensorflow-io-gcs-filesystem, opt-einsum, gast, tensorflow
  Attempting uninstall: importlib-metadata
    Found existing installation: importlib-metadata 2.0.0
    Uninstalling importlib-metadata-2.0.0:
      Successfully uninstalled importlib-metadata-2.0.0
    Running setup.py install for grpcio ... -

I am using version Python 3.8.5 video-reuse-detector. - Requires: Python >=3.7, <3.11

Error installing the package tensorflow but https://pypi.org/project/tensorflow/2.11.0/ required Python >=3.7

Launch on myBinder : addition kaleido package

still step 2 running 9%

eliselavy commented 6 months ago

Ok can install the package video-reuse-detector with Python 3.7.10

ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
built with Apple clang version 14.0.0 (clang-1400.0.29.202)

Nbconvert:

~/.pyenv/versions/anaconda3-2020.02/lib/python3.7/site-packages/vrd/frame_extractor.py in _extract_png_from_video(self, video_path, frame_directory)
    180                 .run(capture_stdout=True, capture_stderr=True)
    181             )
--> 182         except ffmpeg.Error as error:
    183             # TODO: Use logger instead? Throw exception?
    184             print("stdout:", error.stdout.decode("utf8"))

AttributeError: module 'ffmpeg' has no attribute 'Error'
AttributeError: module 'ffmpeg' has no attribute 'Error'
TomasSkotare commented 6 months ago

Hello! I've gotten some messages about this, and perhaps I can help.

The version that is up was created about a year ago, and did use to work, however there might have been some changes in Python versions and so on to make it more difficult to run, mainly a new version of tensorflow has been released.

I do not have a Mac computer available to me right at this moment, but in general the easiest way to get it to run would be using Docker. Is this an option for you?

eliselavy commented 6 months ago

Fixed with ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers

@TomasSkotare indeed a Docker will be better but problem missing files 'demo_unwanted_frames.xlsx' in the demo folder

TomasSkotare commented 6 months ago

@eliselavy That seems to be the case, which is interesting. I will supply this early version of the file, which should work; it simply deletes one sequence in each file, as a demonstration.

This file should be placed in the root directory, the same as the README.md file is located in.

demo_unwanted_frames.xlsx

eliselavy commented 5 months ago

Sent to peer-review