abi / screenshot-to-code

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
https://screenshottocode.com
MIT License
61.57k stars 7.52k forks source link

Support multiple selection in "Select and Edit" #418

Open radrad opened 1 month ago

radrad commented 1 month ago

Describe the bug

I am using VS code insider in admin mode.

In backend .env I entered my AI keys: OPENAI_API_KEY=sk-2siLny... ANTHROPIC_API_KEY=sk-ant-api0...

When drag/drop an .mp4 video below: https://github.com/user-attachments/assets/22713a47-4d23-44e9-b83f-dcb774ebbcc8 I am getting a notification dialog: Error assembling prompt. Contact support at support@picoapps.xyz

How can I use the latest models and what it the code I should change? I want to use the latest OppenAI model: o1-preview which points to o1-preview-2024-09-12 and I want to use the lastes Anthropic model: claude-3-5-sonnet-latest which points to claude-3-5-sonnet-20241022

I am confused where in the code I can designate the latest versions

When I drag/drop .png image:

screenshot1

I cannot see Option 1 rendering. What is that Option 1 suppose to show? Open AI based generation?

How can I use the latest o1 model

frontend\src\lib\models.ts `// Keep in sync with backend (llm.py) // Order here matches dropdown order export enum CodeGenerationModel { CLAUDE_3_5_SONNET_2024_06_20 = "claude-3-5-sonnet-20240620", GPT_4O_2024_05_13 = "gpt-4o-2024-05-13", GPT_4_TURBO_2024_04_09 = "gpt-4-turbo-2024-04-09", GPT_4_VISION = "gpt_4_vision", CLAUDE_3_SONNET = "claude_3_sonnet", }

// Will generate a static error if a model in the enum above is not in the descriptions export const CODE_GENERATION_MODEL_DESCRIPTIONS: { [key in CodeGenerationModel]: { name: string; inBeta: boolean }; } = { "gpt-4o-2024-05-13": { name: "GPT-4o", inBeta: false }, "claude-3-5-sonnet-20240620": { name: "Claude 3.5 Sonnet", inBeta: false }, "gpt-4-turbo-2024-04-09": { name: "GPT-4 Turbo (deprecated)", inBeta: false }, gpt_4_vision: { name: "GPT-4 Vision (deprecated)", inBeta: false }, claude_3_sonnet: { name: "Claude 3 (deprecated)", inBeta: false }, };`

Console log for backend:

Using openAiApiKey from client-side settings dialog Using anthropicApiKey from client-side settings dialog Using official OpenAI URL Generating react_tailwind code in video mode using Llm.CLAUDE_3_5_SONNET_2024_06_20... Status (variant 0): Generating code... Status (variant 1): Generating code... C:\Users\Greg\AppData\Local\Temp\tmpkwxwbc8s.mp4 Error assembling prompt. Contact support at support@picoapps.xyz ERROR: Exception in ASGI application Traceback (most recent call last): File "C:\Users\Greg.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\moviepy\video\io\ffmpeg_reader.py", line 285, in ffmpeg_parse_infos line = [l for l in lines if keyword in l][index]


IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\uvicorn\protocols\websockets\websockets_impl.py", line 250, in run_asgi
    result = await self.app(self.scope, self.asgi_receive, self.asgi_send)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\fastapi\applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\middleware\errors.py", line 149, in __call__
    await self.app(scope, receive, send)
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\middleware\cors.py", line 75, in __call__
    await self.app(scope, receive, send)
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\middleware\exceptions.py", line 79, in __call__
    raise exc
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\middleware\exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in __call__
    raise e
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\routing.py", line 341, in handle
    await self.app(scope, receive, send)
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\starlette\routing.py", line 82, in app
    await func(session)
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\fastapi\routing.py", line 289, in app
    await dependant.call(**values)
  File "J:\k8s\ArgoCD\Git\Maui\The Path to Self-Transformation\Automation\screenshot-to-code\backend\routes\generate_code.py", line 234, in stream_code     
    prompt_messages, image_cache = await create_prompt(params, stack, input_mode)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "J:\k8s\ArgoCD\Git\Maui\The Path to Self-Transformation\Automation\screenshot-to-code\backend\prompts\__init__.py", line 72, in create_prompt        
    prompt_messages = await assemble_claude_prompt_video(video_data_url)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "J:\k8s\ArgoCD\Git\Maui\The Path to Self-Transformation\Automation\screenshot-to-code\backend\video\utils.py", line 21, in assemble_claude_prompt_video
    images = split_video_into_screenshots(video_data_url)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "J:\k8s\ArgoCD\Git\Maui\The Path to Self-Transformation\Automation\screenshot-to-code\backend\video\utils.py", line 79, in split_video_into_screenshots
    clip = VideoFileClip(temp_video_file.name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\moviepy\video\io\VideoFileClip.py", line 88, in __init__
    self.reader = FFMPEG_VideoReader(filename, pix_fmt=pix_fmt,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\moviepy\video\io\ffmpeg_reader.py", line 35, in __init__
    infos = ffmpeg_parse_infos(filename, print_infos, check_duration,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Greg\.virtualenvs\agents_with_json_mode_only-I1PZcyP6\Lib\site-packages\moviepy\video\io\ffmpeg_reader.py", line 289, in ffmpeg_parse_infos 
    raise IOError(("MoviePy error: failed to read the duration of file %s.\n"
OSError: MoviePy error: failed to read the duration of file C:\Users\Greg\AppData\Local\Temp\tmpkwxwbc8s.mp4.
Here are the file infos returned by ffmpeg:

ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 9.2.1 (GCC) 20200122
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
C:\Users\Greg\AppData\Local\Temp\tmpkwxwbc8s.mp4: Permission denied
abi commented 1 month ago

Re: images, I made it a little trickier to set the model in code with the newest change that support multiple options.

If you pull the latest and have both Anthropic and OpenAI keys set, it will use the latest Claude 1022. See https://github.com/abi/screenshot-to-code/blob/8ee26ff566e5f4502f142e94fe992832d52ea0db/backend/routes/generate_code.py#L312

We currently use GPT_4O_2024_05_13 which is one update behind https://github.com/abi/screenshot-to-code/blob/8ee26ff566e5f4502f142e94fe992832d52ea0db/backend/routes/generate_code.py#L299 You can update that to the latest if you want.

You can't use o1-preview-2024-09-12 because that doesn't support image input as far as I know.

I'll make it easier to choose models in the future.

Re: video, this is a known issue. If you can convert the format of the video using video convertor, it should work. Some browser don't set duration correctly when capturing video and so, it doesn't work. You could also try a different browser.

radrad commented 1 month ago

What about having both Option 1 (which I don't have. I have both Anthropic and OpenAI keys in .env) and Option 2 (which I do have)?

I find that model handling and hard coding them in multiple places is very hard to maintain for future updates. I understand there are differences in code bases on how some models from different providers are treated (to enable or disable some features) This is what I changed and feel free to apply this patch if my changes are appropriate to bring the latest models from both Anthropic and OpenAI. my_changes.patch

Re: "If you can convert the format of the video using video convertor, it should work." What should I convert in video?: image

The video (which you can check (https://github.com/user-attachments/assets/22713a47-4d23-44e9-b83f-dcb774ebbcc8) is captured by Snagit tool and you can see from properties of this video file that it does have Lenght and other video data (that can be seen in the picture)

Where exactly in the code I am getting this error?

Next. It would be good when selecting fine grained changes that you accumulate more than one selection. I find that after selecting some html element and providing a desired update, that there is an immediate code re-generation. I would prefere there are multiple selection and update prompts possible before re-generating

abi commented 1 month ago

Yeah, I will look to support the newest GPT4o model. Need to do some testing to ensure quality before switching to it.

I think a good thing to convert is to just re-encode it as MP4. If that's confusing, you could do MP4 to WebM. Or MP4 -> WebM -> MP4. It's an encoding issue with the video as far as I know when the duration error shows up.

Good suggestion re: more than 1 selection. Also, exploring newer models like Llama on Groq so the change is instant.

radrad commented 1 month ago

Snagit is a professional screen video capture tool. I cannot see any problem with what it creates as .mp4. I did what you suggested MP4 -> WebM -> MP4 and there is the same file with this conversion steps. https://github.com/user-attachments/assets/fe15a9de-ab1a-4423-ac6b-9fa7da46e239

Can you try with my original video and this one.

Can you provide me with a couple of videos that do work to see if there is some other problem.

What about producing both Option 1 and Option 2 as now I only get Option 2. When whould Option 1 generate code?

abi commented 4 weeks ago
Screenshot 2024-10-28 at 6 30 59 PM

The video you provided works for me. What error do you get?

If Option 1 isn't working, please share the backend logs. But even without that, my guess would be that their Anthropic key isn't right.