Open mano3-1 opened 3 weeks ago
Does your dataset has mixed modality? It does not supprot mixed modality for now. I'm finding a way for it.
I've raised PR #14 to address handling of mixed-modality datasets. Specifically, I'm excluding all_pixel_values and all_image_grid_thw from the final data_dict and implemented a data sampler to maintain homogeneous batches.
Could you help identify possible causes for a broadcasting error here?
@mano3-1 It's a bit wierd I can't reproduce a same issue. Also I'm looking at the PR you've made, Thanks for the PR I'll review it.
Also, in your code I think you are using the latest transformers. Can you use the version in the README? It can load the model but, I'm not sure the mrope is properly loaded from the newest version.
Hey, I installed the recommended version of transformers (from your readme). Still getting the same result (I tried with image-text pairs)
@mano3-1 It's weird that it works fine for me right now. I'll keep debug it.
@mano3-1 Can you share your part of the data you are training? I think it's a type of a data problem because I'm testing with my sample of dataset but it dosen't show any errors.
I am also running into this problem right now, my dataset does not need to be mixed modality for my workload; how would one format according to the llava specificiation without it being mixed modality?, do you have an example of an entry?
@Julian2002 @mano3-1
[
{
"id": "traffic_img1442_jpg",
"image": "AccidentDetection/train/traffic_img1442_jpg.rf.0c12f50f745b52cff83669488676faac.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nCan you describe what\u2019s happening in the image?"
},
{
"from": "gpt",
"value": "In the image, there appears to be a scene of destruction where a vehicle is partially submerged in what looks like a swimming pool. The vehicle seems to have suffered significant damage, with parts of it distorted or broken. The pool has a blue tiled surface, and the water looks disturbed, indicating the vehicle has either recently entered the pool or some activity has occurred. The setting seems unusual since vehicles are not typically found in swimming pools, suggesting an accident or a staged event for entertainment or testing purposes."
},
{
"from": "human",
"value": "What objects are identifiable in the image, and where are they located?"
},
{
"from": "gpt",
"value": "The primary object in the image is a damaged vehicle. It is located at the center of the image, running lengthwise along the pool. The vehicle's rear is towards the bottom of the image, and the front is towards the top part, mostly submerged in water. The pool itself is another identifiable object, with its blue tiled lining clearly visible throughout the image. There is also a reddish-pink border running along the right side of the pool. The words \"DestructionNation\" are overlaid in the upper-left corner of the image."
},
{
"from": "human",
"value": "Based on the visual cues, what can be inferred about the possible purpose or convalue of this scene?"
},
{
"from": "gpt",
"value": "From the visual cues, it can be inferred that this scene is likely set up for an event or demonstration involving the destruction of vehicles, possibly for entertainment, research, or testing purposes. The presence of the value \"DestructionNation\" suggests a branding associated with destruction or crash testing, which might indicate that the scene is either part of a video, a show, or an experimental setup focusing on vehicle destruction dynamics. Additionally, the unusual placement of the vehicle in a swimming pool and the apparent significant damage imply that this might be a staged incident rather than an actual accident. This type of scene is frequently seen in scenarios where vehicles are put through extreme conditions to study impacts, gather data, or entertain viewers by displaying dramatic crashes."
}
]
},
{
"id": "SEV_22_139_jpg",
"image": "accident3.0/train/SEV_22_139_jpg.rf.c1bfaa1cd384988e55a7f37de9e8cbd5.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nWhat could be the possible cause of the congestion in the image, and how might it affect the vehicles on the highway?"
},
{
"from": "gpt",
"value": "In the image, depicted is a scene on a highway with considerable congestion. A likely cause of this congestion is the overturned vehicle in the middle of the road. The orientation and position of this vehicle suggest an accident or a loss of control which resulted in the vehicle flipping onto the road. To the left, a few cars are cautiously navigating around the incident, potentially slowing down to avoid collision and to assess the situation. In the background, more vehicles are seen approaching the congestion zone, indicating a build-up of traffic. The presence of the overturned vehicle necessitates careful maneuvering, potentially causing additional delays as other vehicles have to slow down significantly or change lanes to safely pass by. This bottleneck can result in a cascading effect, leading to slower overall traffic flow and prolonged congestion on the highway. Verifying the scene, no emergency response teams are visible, which might further exacerbate the delay until the situation is addressed."
}
]
},
{
"id": "argusAcc321_jpg",
"image": "accident3.0/train/argusAcc321_jpg.rf.5bb271f3699a9db06865c12f6f77b682.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nBased on the visual elements observed, what potential traffic situation is depicted here, and how might it impact the flow of vehicles at this location?"
},
{
"from": "gpt",
"value": "The image shows a white car and a black car positioned close to each other, possibly indicating a traffic incident, such as a minor collision or contact. Both cars are at an intersection angle, suggesting they might have maneuvered into each other\u2019s paths. The proximity of the vehicles could cause a temporary obstruction, impacting the traffic flow by forcing other drivers to slow down or navigate around the stopped cars. If this is a busy area, it might lead to congestion or cause other vehicles to reroute. Additionally, without visible pedestrians, motorcycles, or signs, the focus remains solely on the cars, indicating nearby traffic measures might need to address vehicle-only issues cohesively. Ensuring such locations have clear signage and perhaps traffic lights can help mitigate similar incidents in the future."
}
]
},
{
"id": "acc1252_jpg",
"image": "accident3.0/train/acc1252_jpg.rf.71f76bf5073e3012383af6ffd8f752dd.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nBased on the given image, describe the potential causes and consequences of the situation depicted? Include the interactions between the objects and people present."
},
{
"from": "gpt",
"value": "The image shows an overturned car in the middle of a road, which likely indicates a significant accident. There are a notable number of pedestrians and riders gathered in proximity to the overturned vehicle, suggesting that they may be either witnesses or people offering help. An emergency response vehicle, specifically an ambulance, is also visible nearby, suggesting medical assistance is either in progress or being prepared. The presence of multiple cars near the scene, some appearing to be pulled over, implies that the accident might have caused traffic disruptions. The truck and bus are positioned further back, potentially aware of the incident and thus stopping short. This setup indicates a serious incident with potential injuries, given the medical presence, and highlights the role of bystanders and responders in handling road accidents. The consequences likely include traffic delays, injuries from the crash, and an ongoing response to attend to any victims."
}
]
}
]
This is the dataset I'm using for testing right now.
Hi, Thank for such an wonderful repo. I was trying to train the model with a custom dataset using the lora script and getting the below error:
Any help would be appreciated!
Thanks in Advance